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(54) TiUe: MUTANT AEQUOREA VICTORIA FLUORESCENT PROTEINS HAVING INCREASED CELLULAR FLUORESCENCE 
(57) Abstract 

The present invention is diiected to mutants of the jellyfish Aequorea victoria green fluorescent protein (GFP) having at least 5 and 
preferably grwtcr than 20 Umes die specific green fluorescence of the wild type protein. In other embodiments, the invention comprises 
mutant blue fluorescent proteins (BFPs) that emit an enhanced blue fluorescence. TTw inventlcHi also encompasses the expression of 
nucleic acids that encode a mutant GFP or BFP in a wide variety of engineered host cells, and the isolation of engineered proteins having 
increased fluorescent activity. The novel mutants of the present invention allow for a significantly more sensitive detection of fluorescence 
in engineered host cells than is possible with GFP or with its known mutants. ITius, the mutant fluorescent proteins provided herein can be 
used as sensitive reporter molecules to detect the cell and tissue-specific expression and subcellular compartmentalization of GFP or BFP 
mutants, or of chimeric proteins comprising GFP or BFP mutants fused to a regulatory sequence or to a second protein sequence. 
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MUTANT AEQUOREA VICTORIA FLUORESCENT PROTEINS 
HAVING INCREASED CELLULAR FLUORESCENCE 

FIELD OP THE INVENTION 

This invention generally relates to novel proteins 
and their production which are useful for detecting gene 
expression and for visualizing the subcellular targeting and 
distribution of selected proteins and peptides, among other 
things. The invention specifically relates to mutations in 
the gene coding for the jellyfish Aeguorea victoria green 
fluorescent protein ("GFP"), which mutations encode mutant GPP 
proteins having either an enhanced green or a blue 
fluorescence, and uses for them. 

BACKGROUND OF THE INVENTION 

Green fluorescent protein ("GFP") is a monomeric 
protein of about 27 kDa which can be isolated from the 
bioluminescent jellyfish Aequorea victoria. When wild type 
GFP is illuminated by blue or ultraviolet light, it emits a 
brilliant green fluorescence. Similar to fluorescein 
isothiocyanate, GFP absorbs ultraviolet and blue light with a 
maximum absorbance at 3 95 nm and a minor peak of absorbance at 
470 nm, and emits green light with a maximum emission at 509 
nm with a minor peak. at 540 nm. GFP fluorescence persists 
even after fixation with formaldehyde, and it is more stable 
to photobleaching than fluorescein. 

The gene for GFP has been isolated and sequenced. 
Prasher, D. C. et al , (1992), "Primary structure of the 
Aequorea victoria green fluorescent protein," Gene 111:229- 
233. Expression vectors that comprise the GFP gene or cDNA 
have been introduced into a variety of host cells. These host 
cells include: Chinese hamster ovary (CHO) cells, human 
embryonic kidney cells (HEK293) , COS-1 monkey cells, myeloma 
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cells, NIH 3T3 mouse fibroblasts, PtKl cells, BHK cells, PC12 
cells, Xenopus, leech, transgenic zebra fish, transgenic mice, 
Drosophila and several plants. The GFP molecules expressed by 
these different cells have a similar fluorescence as the 
5 native molecules, demonstrating that the GFP fluorescence does 
not require any species-specific cof actors or substrates. 
See, e.g., Baulcombe, D. et al . (1995), "Jellyfish green 
fluorescent protein as a reporter for virus infections," The 
Plant Journal 7:1045-1053; Chalfie, M. et ai. (1994), "Green 

10 fluorescent protein as a marker for gene expression, " Science 
263:802-805; Inouye, S. & Tsuji, F. (1994), "Aequorea green 
fluorescent protein: expression of the gene and fluorescent 
characteristics of the recombinant protein, " FEBS Letters 
341:277-280; Inouye, S. & Tsuji, F. (1994), "Evidence for 

15 redox forms of the Aequorea green fluorescent protein." FEBS 
Letters 351:211-214; Kain, S. et al . (1995), "The green 
fluorescent protein as a reporter of gene expression and 
protein localization," BioTechniques (in press); Kitts, P. et 
aJ. (1995), "Green Fluorescent Protein (GFP): A novel reporter 

20 for monitoring gene expression in living organisms, " 

CLOWTECHniques X(l) : 1-3; Lo, D. etal. (1994), "Neuronal 
transfection in brain slices using particle-mediated gene 
transfer," Weuron 13:1263-1268; Moss, J. B. & Rosenthal, N. 
(1994), "Analysis of gene expression patterns in the embryonic 

25 mouse myotome with the green fluorescent protein, a new vital 
marker," J". Cell. Biochem. , Supplement 18DW161; Niedz , R. et 
al. (1995), "Green fluorescent protein: an in vivo reporter of 
plant gene expression," Plant Cell Reports 14:403-406; Wu, 
G.-I. et al. (1995), "Infection of frog neurons with vaccinia 

30 virus permits in vivo expression of foreign proteins," Neuron 
14:681-684; Yu, J. & van den Engh, G. (1995), "Flow-sort and 
growth of single bacterial cells transformed with cosmid and 
plasmid vectors that include the gene for green- fluorescent 
protein as a visible marker," Abstracts of papers presented at 

35 the 1995 meeting on "Genome Mapping and Sequencing, " Cold 
Spring Harbor, p. 293. 

The active GFP chromophore is a hexapeptide which 
contains a cyclized Ser-dehydroTyr-gly trimer at positions 65- 
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67. This chromophore is only fluorescent when embedded 
within the intact GFP protein. Chromophore formation occurs 
post-translationally; nascent GFP is not fluorescent. The 
chromophore is thought to be formed by a cyclization reaction 
and an oxidation step that requires molecular oxygen. 

Proteins can be fused to the amino (N-) or carboxy 
(C-) terminus of GFP. Such fused proteins have been shown to 
retain the fluorescent properties of GFP and the functional 
properties of the fusion partner. Bian, J. et ai. (1995), 
"Nuclear localization of HIV-l matrix protein P17: The use of 
A. victoria GFP in protein tagging and tracing," FASEB J. 
9:AI279; Flach, J. etal. (1994), "A yeast RNA-binding 
protein shuttles between the nucleus and the cytoplasm, " Mol . 
Cell. Biol. 14:8399-8407; Marshall, J. etal. (1995), "The 
jellyfish green fluorescent protein: a new tool for studying 
ion channel expression and function," Neuron 14:211-215; 
Olmsted, J, et ai . (1994). "Green Fluorescent Protein (GFP) 
chimeras as reporters for MAP4 behavior in living cells," Mol. 
Biol, of the Cell 5:167a; Rizzuto, R. et al . (1995), "Chimeric 
green fluorescent protein as a tool for visualizing 
subcellular organelles in living cells," Current Biol. 
5:635-642; Sengupta, P. et al . (1994), "The C. elegans gene 
odr-7 encodes an olf actory- specif ic member of the nuclear 
receptor superfamily, " Cell 79:971-980; Stearns, T, (1995), 
"The green revolution," Current Biol. 5:262-264; Treinin, M. & 
Chalfie, M. (1995), "A mutated acetylcholine receptor subunit 
causes neuronal degeneration in C. elegans," Neuron 14:871- 
877; Wang, S. & Hazelrigg, T. (1994), "Implications for bed 
MRNA localization from spatial distribution of exu protein in 
Drosophila oogenesis," Nature 369:400-403. 

A number of GFP mutants have been reported. 
Delagrave, S. et al. (1995) "Red-shifted excitation mutants of 
the green fluorescent protein," Bio/Technology 13:151-154; 
Heim, R. et aJ. (1994) "Wavelength mutations and 
posttranslational autoxidation of green fluorescent protein, " 
Proc. Natl. Acad. Sci. USA 91:12501-12504; Heim, R. et a J . 
(1995), "Improved green fluorescence," Nature 373:663-664. 
Delgrave et al. (1995) Bio/Technology 13:151-154 isolated 
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mutants of cloned Aequorea victoria GFP that had red-shifted 
excitation spectra. Heim, R. et al. (1994) "Wavelength 
mutations and posttranslational autoxidation of green 
fluorescent protein," Proc. Natl. Acad, Sci . USA 91:12501- 
5 12504 reported a mutant (Tyr66 to His) having a blue 

fluorescence, which is herein designated BFP (Tyrg^-^His) . 
These references have neither taught nor suggested that their 
mutations resulted in an increase in the cellular fluorescence 
of the mutant GFPs. 

10 In general, the level of fluorescence of a protein 

expressed in a cell depends on several factors, such as number 
of copies made of the fluorescent protein, stability of the 
protein, efficiency of formation of the chromophore, and 
interactions with cellular solvents, solutes and structures. 

15 Although the fluorescent signal from wild type GFP or from the 
reported mutants is generally adequate for bulk detection of 
abundantly expressed GFP or of GFP-containing chimeras, it is 
inadequate for detecting transient low or constitutively low 
levels of expression, or for performing fine structural 

20 subcellular localizations. This limitation severely restricts 
the use of native GFP or of the reported mutants as a 
biochemical and structural marker for gene expression and 
morphological studies . 

25 SUMMARY OF THE INVENTION 

It an object of the invention to provide engineered 
GFP-encoding nucleic acid sequences that encode modified GFP 
molecules having a greater cellular fluorescence than wild 
30 type GFP or prior described recombinant GFP. 

It is a further object of this invention to provide 
recombinant vectors containing these modified GFP-encoding 
nucleic acid sequences, which vectors are capable of being 
inserted into a variety of cells (including mammalian and 
35 eukaryotic cells) and expressing the modified GFP. 

It is also an object of this invention to provide 
host cells capable of providing useful quantities of 
homogeneous modified GFP. 
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It is yet another object of this invention to 
provide peptides that possess a greater cellular fluorescence 
than native GFP or unaltered recombinant GFP and that can be 
produced in large quantities in a laboratory, by a 
microorganism or by a cell in culture. 

These and other objects of the invention have been 
accomplished by providing mutant GFP-encoding nucleic acids 
whose gene product exhibits an increased cellular fluorescence 
relative to naturally occurring or recombinantly produced wild 
type GFP ("wtGFP"). In some embodiments, the modified GFPs 
possess fluorescent activity that is 50-100 fold greater than 
that of unmodified GFP. 

The modified proteins of the present invention are 
produced by making mutations in a genetic sequence that 
result in alterations in the amino acid sequence of the 
resulting gene product. Our starting material was a GFP- 
encoding nucleic acid wherein a codon encoding an additional 
nucleic acid was inserted at position 2 of the previously 
published GFP amino acid sequence (Chalfie et ai., 1994), to 
introduce a useful restriction site. Due to the amino acid 
insertion at position 2 of the GFP amino acid sequence, our 
numbering of the GFP amino acids and description of the amino 
acid amutations is off by one as compared to the originally 
reported wild type GFP sequence (Prasher et ai . , 1992). Thus, 
ammo acid 65 by our numbering corresponds to amino acid 64 of 
the originally reported wild type GFP, amino acid 168 
corresponds to amino acid 167 of the originally reported wild 
type GFP, etc- 

Using the modified wild type GFP described herein, a 
number of the unique mutants described herein derive from the 
discovery of an unplanned and unexpected mutation called 
"SG12", obtained in the course of site-directed mutagenesis 
experiments, wherein a phenylalanine at position 65 of wtGFP 
was converted to leucine. A mutant referred to as "SGll, " 
which combined the phenylalanine 65 to leucine alteration with 
an isoleucine 168 to threonine substitution and a lysine 23 9 
to asparagine substitution, gave a further enhanced 
fluorescence intensity. The lysine 239 to asparagine 
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substitution does not affect the fluorescence of GFP; indeed 
the C- terminal lysine or asparagine may be deleted without 
affecting fluorescence. A third and further improved GFP 
mutant was obtained by further mutating "SGll." This mutant 
5 is referred to as »'SG25" and , in addition to the SGll 

mutations, contains an additional mutation, a substitution of 
a cysteine at position 66 for the serine normally found at 
that position in the sequence. 

In addition, the invention encompasses novel GFP 

10 mutants that emit a blue fluorescence. These blue mutants are 
derived from a mutation of the wild type GFP (Heim, R. et ai . 
(1994) ''Wavelength mutations and posttranslational 
autoxidation of green fluorescent protein," Froc. Natl. Acad. 
Sci. USA 91:12501-12504), in which histidine was substituted 

15 for tyrosine at amino acid position 66. This mutant emits a 

blue fluorescence, i.e., it becomes a Blue Fluorescent Protein 
(BFP) . 

Novel BFP mutants having an enhanced blue 
fluorescence were made by further modifying this 

20 BFP (Tyrg7-*His) . The introduction of the same mutation used to 
generate SG12, (i.e., phenylalanine to leucine at position 65) 
into BFP {Tyrg7^His) resulted in a new mutant having a brighter 
fluorescence, designated "SuperBlue-4 2 " (SB42) . A second 
independently generated mutation of BFP (Tyrg^-^His) , in which a 

25 valine at position 164 was converted to alanine, also emitted 
an enhanced blue fluorescent signal and is referred to as 
"SB4 9." A combination of the above two mutations resulted in 
"SB50", which exhibited an even greater fluorescence 
enhancement than either of the previous mutations. 

30 The novel GFP and BFP mutants of this invention 

allow for a significantly more sensitive detection of 
fluorescence in host cells than is possible with the wild type 
protein. Accordingly, the mutant GFPs provided herein can be 
used, among other things, as sensitive reporter molecules to 

35 detect the cell and tissue-specific expression and subcellular 
compartmentalization of GFP or of chimeric proteins comprising 
GFP fused to a regulatory sequence or to a second protein 
sequence. In addition, these mutations make possible a 
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variety of one and two color protein assays to quantitate 
expression in mammalian cells. 



DETAILED DESCRIPTION OF THE INVENTION 

The present invention comprises mutant nucleic acids 
that encode engineered GFPs having a greater cellular 
fluorescence than either native GFP or unaltered ("wild type") 
recombinant GFP, and the mutant GFPs themselves. It further 
comprises a subset of mutant GFPs that are mutant blue 
fluorescent proteins ("BFPs") that are derived from a 
published BFP, designated BFP (Tyrg^^His) , wherein the mutant 
BFPs have a cellular fluorescence that is at least five times 
greater, preferably ten times greater, and most preferably 20 
times greater than that of BFP (Tyrg^-.^His) . The invention also 
encompasses compositions such as vectors and cells that 
comprise either the mutant nucleic acids or the mutant protein 
gene products. The mutant GFP nucleic acids and proteins may 
be used to detect and quantify gene expression in living 
cells, and to detect and quantify tissue specific expression 
and subcellular distribution of GFP or of GFP fused to other 
proteins . 

I. General Definitiona 

Unless defined otherwise, all technical and 
scientific terms used herein have the same meaning as commonly 
understood by one of ordinary skill in the art to which this 
invention belongs. Singleton et al, (1994) Dictionary of 
Microbiology and Molecular Biology, second edition, John Wiley 
and Sons (New York) provides one of skill with a general 
dictionary of many of the terms used in this invention. 
Although any methods and materials similar or equivalent to 
those described herein can be used in the practice or testing 
of the present invention, the preferred methods and materials 
are described. For purposes of the present invention, the 
following terms are defined below. 
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The symbols, abbreviations and definitions used 
herein are set forth below: 

DNA, deoxyribonucleic acid 
RNA, ribonucleic acid 
mRNA, messenger RNA 

cDNA, complementary DNA (enzymatically synthesized from an 

mRNA sequence) 
A- Adenine 
T- Thymine 
G-Guanine 
C-Cytosine 
U-Uracil 

GFP, Green Fluorescent Protein 
BFP, Blue Fluorescent Protein 

Amino acids are sometimes referred to herein by the 
conventional one or three letter codes. 

Wild type green fluorescent protein ("wtGFP") refers 
to the 239 amino acid sequence described by Chalfie et al . , 
Science 263, 802-805, 1994, the nucleotide sequence of which 
is set out as SEQ ID N0:1, and the amino acid sequence of 
which is set out as SEQ ID NO: 2. This sequence differs from 
the original 238 amino acid GFP isolated from the 
bioluminescent jellyfish Aeguorea victoria in that one amino 
acid has been inserted after position 2 of the 238 amino acid 
sequence. When reference in this application is made to an 
amino acid position of GFP, the position is made with 
reference to that described by Chalfie et al., supra and thus 
of SEQ ID N0:2. 

The term "blue fluorescent protein" (BFP) refers to 
mutants of wtGFP wherein the tyrosine at position 67 is 
converted to a histidine, which mutants emit a blue 
fluorescence. The non- limiting prototype is herein designated 
BFP(Tyrg7-His) . 

A shorthand designation for mutations that result in 
a change in amino acid sequence is the one or three letter 
code for the original amino acid, the number of the position 
of the amino acid in the wtGFP sequence, followed by the one 
or three letter code for the new amino acid. Thus, Phe65Leu 
or F65L both designate a mutation wherein the phenylalanine at 
position 65 of the wtGFP is converted to leucine. 

Salts of any of the proteins described herein will 
naturally occur when such proteins are present in (or isolated 
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from) aqueous solutions of various pHs . All salts of peptides 
having the indicated biological activity are considered to be 
within the scope of the present invention. Examples include 
alkali, alkaline earth, and other metal salts of carboxylic 
acid residues, acid addition salts (e.g., HCl) of amino 
residues, and Zwitterions formed by reactions between 
carboxylic acid and amino acid residues within the same 
molecule . 

The terms '^bioluminescent " and "fluorescent" refer 
to the ability of GFP or of a derivative thereof to emit light 
("emitted or fluorescent light") of a characteristic 
wavelength when excited by light which is generally of a 
characteristic and different wavelength than that used to 
generate the emission. 

The term "cellular fluorescence" denotes the 
fluorescence of a GFP-derived protein of the present invention 
when expressed in a cell, especially a mammalian cell. 

The term "nucleic acid" refers to a 
deoxyribonucleotide or ribonucleotide polymer in either 
single- or double-stranded form, and unless specifically 
limited, encompasses known analogues of natural nucleotides 
that hybridize to nucleic acids in a manner similar to 
naturally occurring nucleotides. Unless otherwise indicated, 
a particular nucleic acid sequence implicitly provides the 
complementary sequence thereof, as well as the sequence 
explicitly indicated. As used herein, the terms "nucleic 
acid" and "gene" are interchangeable, and they encompass the 
term "cDNA." 

The phrase "a nucleic acid sequence encoding" refers 
to a nucleic acid which contains sequence information that, if 
translated, yields the primary amino acid sequence of a 
specific protein or peptide. This phrase specifically 
encompasses degenerate codons (i.e., different codons which 
encode a single amino acid) of the native sequence or 
sequences which may be introduced to conform with codon 
preference in a specific host cell. 

The phrase "nucleic acid construct" denotes a 
nucleic acid that is composed of two or more nucleic acid 
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sequences that are derived from different sources and that are 
ligated together using methods known in the art. 

The term "regulatory sequence" denotes all the non- 
coding elements of a nucleic acid sequence required for the 
5 correct and efficient expression of the "coding region" (i.e., 
the region that actually encodes the amino acid sequence of a 
peptide or protein), e.g., binding cites for polymerases and 
transcription factors, transcription and translation 
initiation and termination sequences, TATA box, a promoter to 
10 direct transcription, a ribosome binding site for 

translational initiation, polyadenylation sequences, enhancer 
elements . 

The term "isolated" refers to material which is 
substantially or essentially free from components which 

15 normally accompany it as found in its native state {for 

example, a band on a gel) , The isolated nucleic acids and the 
isolated proteins of this invention do not contain materials 
normally associated with their in situ environment, in 
particular, nuclear, cytosolic or membrane associated proteins 

20 or nucleic acids other than those nucleic acids which are 

indicated. The term "homogeneous" refers to a peptide or DNA 
sequence where the primary molecular structure (i.e., the 
sequence of amino acids or nucleotides) of substantially all 
molecules present in the composition under consideration is 

25 identical. The term "substantially" used in the preceding 
sentences preferably means at least 80% by weight, more 
preferably at least 95% by weight, and most preferably at 
least 99% by weight. 

The nucleic acids of this invention, whether RNA, 

30 cDNA, genomic DNA, or a hybrid of the various combinations, 

are synthesized in vitro or are isolated from natural sources 
or recombinant clones. The nucleic acids claimed herein are 
present in transformed or transfected whole cells, in 
transformed or transfected cell lysates, or in a partially 

35 purified or substantially pure form. The nucleic acids of the 
present invention are obtained as homogeneous preparations. 
They may be prepared by standard techniques well known in the 
art, including selective precipitation with such substances as 
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ammonium sulfate, isopropyl alcohol, ethyl alcohol, and/or 
exclusion, ion exchange or affinity column chromatography, 
imraunopurif ication methods, and others. 

The phrase "conservatively modified variants 
thereof," when used with reference to a protein, denotes 
conservative amino acid substitutions in which both the 
original and the substituted amino acids have similar 
structure (e.g.. the R group contains a carboxylic acid) and 
properties (e.g.. the original and the substituted amino acids 
are acidic, such as glutamic and aspartic acid), such that the 
substitutions do not essentially alter specified properties of 
the protein, such as fluorescence. Amino acid substitutions 
that are conservative are well known in the art. The phrase 
"conservatively modified. variants thereof." when used to 
describe a reference nucleic acid, denotes nucleic acids 
having nucleotide substitutions that yield degenerate codons 
for a given amino acid or that encode conservative amino acid 
substitutions, as compared to the reference nucleic acid. 

The term "recombinant" or "engineered" when used 
with reference to a nucleic acid or a protein generally 
denotes that the composition or primary sequence of said 
nucleic acid or protein has been altered from the naturally 
occurring sequence using experimental manipulations well Jcnown 
to those sJcilled in the art. It may also denote that a 
nucleic acid or protein has been isolated and cloned into a 
vector, or that the nucleic acid that has been introduced into 
or expressed in a cell or cellular environment other than the 
cell or cellular environment in which said nucleic acid or 
protein may be found in nature. The phrase "engineered 
Aequorea victoria fluorescent protein" specifically 
encompasses a protein obtained by introducing one or more 
sequence alterations into the coding region of a nucleic acid 
that encodes wild type Aequorea victoria GFP, wherein the gene 
product of the engineered nucleic acid is a fluorescent 
protein recognized by antisera to wild type Aequorea victoria 
GFP .. 

The term "recombinant" or "engineered" when used 
with reference to a cell indicates that, as a result of 
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experimental manipulation, the cell replicates or expresses a 
nucleic acid or expresses a peptide or protein encoded by a 
nucleic acid, whose origin is exogenous to the cell. 
Recombinant cells can express nucleic acids that are not found 
5 within the native (non -recombinant) form of the cell. 

Recombinant cells can also express nucleic acids found in the 
native form of the cell wherein the nucleic acids are re- 
introduced into the cell by artificial means. 

The term "vector" denotes an engineered nucleic acid 

10 construct that contains sequence elements that mediate the 
replication of the vector sequence and/or the expression of 
coding sequences present on the vector. Examples of vectors 
include eukaryotic and prokaryotic plasmids, viruses (for 
example, the HIV virus), cosmids, phagemids, and the like. 

15 The term "operably linked" refers to functional linkage 
between a first nucleic acid (for example, an expression 
control sequence such as a promoter or an array of 
transcription factor binding sites) and a second nucleic acid 
sequence, wherein the expression control sequence directs 

20 transcription of the nucleic acid corresponding to the second 
sequence. One or more selected isolated nucleic acids may be 
operably linked to a vector by methods known in the art. 

"Transduction" or "transformation" denotes the 
process whereby exogenous extracellular DNA is introduced into 

25 a cell, such that the cell is capable of replicating and or 
expressing the exogenous DNA. Generally, a selected nucleic 
acid is first inserted into a vector and the vector is then 
introduced into the cell. For example, plasmid DNA that is 
introduced under appropriate environmental conditions may 

30 undergo replication in the transformed cell, and the 

replicated copies are distributed to progeny cells when cell 
division occurs. As a result, a new cell line is established, 
containing the plasmid and carrying the genetic determinants 
thereof. Transformation by a plasmid in this manner, where 

35 the plasmid genes are maintained in the cell line by plasmid 
replication, occurs at high frequency when the transforming 
plasmid DNA is in closed loop form, and does not or rarely 
occurs if linear plasmid DNA is used. 
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All the patents and publications cited in this 
disclosure are indicative of the level of skill of those 
skilled in the art to which this invention pertains and are 
all herein individually incorporated by reference for all 
purposes . 

The GFP Mutants and Their ExpresBion 

A. The GFP mutants 

The isolated nucleic acids reported here are those 
that encode an engineered protein derived from Aequorea 
victoria green fluorescent protein {"GFP") having a 
fluorescence at maximum emission that is at least five times 
greater, preferably ten times greater, and most preferably 
twenty times greater than the fluorescence at maximum emission 
of wild type GFP. In one embodiment, a nucleic acid encodes 
for leucine at amino acid position 65. This amino acid 
position is important for the enhanced fluorescence. In 
another embodiment the engineered isolated GFP nucleic acid 
also encodes for threonine at amino acid position 168. In an 
additional embodiment, the engineered isolated GFP nucleic 
acid further encodes for cysteine at amino acid position 66. 

Also described here are GFP mutants that have 
enhanced blue fluorescent properties. These mutants have an 
isolated nucleic acid that encode an engineered Aequorea 
victoria blue fluorescent protein that encodes for histidine 
at amino acid position 67, leucine at amino acid position 65 
and has a cellular fluorescence that is at least five times 
greater, preferably lo times greater, most preferably 20 times 
greater than that of SFPCTyre^H^His) . An alternative isolated 
BFP nucleic acid is one that encodes for an engineered 
Aequorea victoria blue fluorescent protein wherein the 
engineered BFP has histidine at amino acid position 67 and 
alanine at amino acid position 164. A third engineered 
isolated BFP nucleic acid sequence is one that has histidine 
at amino acid position 67, leucine at amino acid position 65 
and alanine at amino acid position 164 . 
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The nucleic acid and amino acid sequences for the 
wild type GFP are set out in SEQ ID N0:1 and SEQ ID NO: 2. The 
sequence is well-known, well -described and readily available 
for manipulation and use. Vectors bearing the nucleic acid 
5 sequence are commercially readily available from, for example, 
Clontech Laboratories, Inc., Clontech Laboratories, Inc., Palo 
Alto, CA, Clontech provides a line of reporter vectors for 
GFP. including the cDNA construct described by Chalfie, et 
al., supra, a promoteriess GFP vector for monitoring the 

10 expression of cloned promoters in mammalian cells, and a 

series of vectors for creating fusion proteins to either the 
amino or carboxy terminus of GFP. 

One of skill in the art will recognize many ways of 
generating alterations in a given nucleic acid sequence. Such 

15 well-known methods include site-directed mutagenesis, PGR 

amplification using degenerate oligonucleotides, exposure of 
cells containing the nucleic acid to mutagenic agents or 
radiation, chemical synthesis of a desired oligonucleotide 
(e.g., in conjunction with ligation and/or cloning to generate 

20 large nucleic acids) and other well-known techniques. 5ee, 
e.g., Berger and Kimmel Guide to Woiecular Cloning 
Techniques , Methods in Enzymology Volume 152 Academic Press, 
Inc., San Diego, CA (Berger); Sambrook ec al . (1989) Molecular 
Cloning - A Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring 

25 Harbor Laboratory, Cold Spring Harbor Press, NY, (Sambrook) ; 
and Current Protocols in Molecular Biology, F.M. Ausubel et 
al., eds . , Current Protocols, a joint venture between Greene 
Publishing Associates, Inc. and John Wiley & Sons, Inc., (1994 
Supplement) (Ausubel); Pirrung et al., U.S. Patent No. 

30 5,143,854; and Fodor et al., Science, 251, 767-77 (1991). 

Product information from manufacturers of biological reagents 
and experimental equipment also provide information useful in 
known biological methods. Such manufacturers include the 
SIGMA Chemical Company (Saint Louis, MO) , R&D systems 

35 (Minneapolis, MN) , Pharmacia LKB Biotechnology (Piscataway, 
NJ) , CLONTECH Laboratories, Inc. (Palo Alto, CA) , Chem Genes 
Corp., Aldrich Chemical Company (Milwaukee, WI) , Glen 
Research, Inc., GIBCO BRL Life Technologies, Inc. 
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(Gaithersberg, MD) , Fluka Chemica-Biochemika Analytika (Fluka 
Chetnie AG, Buchs. Switzerland), and Applied Biosystems (Foster 
City, CA) , as well as many other commercial sources known to 
one of skill. Using these techniques, it is possible to 
substitute at will any nucleotide in a nucleic acid that 
encodes any GFP or BFP disclosed herein or any amino acid in a 
GFP or BFP described herein for a predetermined nucleotide or 
amino acid. For example, it is possible to generate at will 
modified GFPs and BFP (Tyrg^^His) s that contain leucine at 
position 65 and one or two or three additional mutations at 
any other position of the wtGFP or BFP (Tyr57-.His) . 

The sequence of the cloned genes and synthetic 
oligonucleotides can be verified using the chemical 
degradation method of A.M. Maxam ec a J . (1980), Methods in 
Enzymology 65:4 99-560. The sequence can be confirmed after 
the assembly of the oligonucleotide fragments into the 
double-stranded DNA sequence using the method of Maxam and 
Gilbert, supra, or the chain termination method for sequencing 
double- stranded templates of R.B. Wallace et al. (1981). Gene, 
16:21-26. DNA sequencing may also be performed by the 
PCR-assisted fluorescent terminator method (ReadyReaction 
DyeDeoxy Terminator Cycle Sequencing Kit, ABI. Columbia. MD) 
according to the manufacturer's instructions, using the ABI 
Model 373A DNA Sequencing System. Sequencing data is analyzed 
using the commercially available Sequencher program (Gene 
Codes,. Gene Codes. Ann Arbor, MI). 

Expreaaion of Mufcant; ctp 

Clearly, the nucleic acid sequences of the present 
invention are excellent reporter sequences since the expressed 
proteins can be readily detected by fluorescence as described 
below. The sequences can be used in conjunction with any 
application appreciated to date for GFP and further in 
applications where a greater degree of fluorescence is 
required. Expression of the sequences described herein 
whether expression is desired alone or in combination with 
other sequences of interest is described below. 
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Vectors to which selected foreign nucleic acids are 
operably linked may be used to introduce these selected 
nucleic acids into host cells and mediate their replication 
and/or expression. Cloning vectors are useful for replicating 
5 the foreign nucleic acids and obtaining clones of specific 

foreign nucleic acid-containing vectors. Expression vectors 
mediate the expression of the foreign nucleic acid. Some 
vectors are both cloning and expression vectors. 

Once a nucleic acid is synthesized or isolated and 

10 inserted into a vector and cloned, one may express the nucleic 
acid in a variety of recombinantly engineered cells known to 
those of skill in the art. As used herein, "expression" 
refers to transcription of nucleic acids, either without or 
preferably with subsequent translation. 

15 Expression of a mutant BFP or of wild type or mutant 

GFP can be enhanced by including multiple copies of the GFP- 
encoding nucleic acid in a transformed host, by selecting a 
vector known to reproduce in the host, thereby producing large 
quantities of protein from exogenous inserted DNA {such as 

20 pUC8, ptacl2, or pIN-III-ompAl, 2, or 3), or by any other 

known means of enhancing peptide expression. In all cases, 
wtGFP or mutant GFPs will be expressed when the DNA sequence 
is functionally inserted into a vector. "Functionally 
inserted" means that it is inserted in proper reading frame 

25 and orientation. Typically, a GFP gene will be inserted 
downstream from a promoter and will be followed by a stop 
codon, although production as a hybrid protein followed by 
cleavage may be used, if desired. 

Examples of cells which are suitable for the cloning 

30 and expression of the nucleic acids of the invention include 
bacteria, yeast, filamentous fimgi, insect (especially 
employing baculoviral vectors) , and mammalian cells, in 
particular cells capable of being maintained in tissue 
culture - 

35 Host cells are competent or rendered competent for 

transformation by various means. There are several well-known 
methods of introducing DNA into animal cells. These include: 
calcium phosphate precipitation, fusion of the recipient cells 
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with bacterial protoplasts containing the DNA, treatment of 
the recipient cells with liposomes containing the DNA, DEAE 
dextran, receptor-mediated endocytosis, electroporation and 
micro-injection of the DNA directly into the cells. 

It is expected that those of skill in the art are 
knowledgeable in the numerous systems available for cloning 
and expression of nucleic acids. In brief summary, the 
expression of natural or synthetic nucleic acids is typically 
achieved by operably linking a nucleic acid of interest to a 
promoter (which is either constitutive or inducible) , and 
incorporating the construct into an expression vector. The 
vectors are suitable for replication and integration in 
prokaryotes, eukaryotes, or both. Typical cloning vectors 
contain transcription and translation terminators, 
transcription and translation initiation sequences, and 
promoters useful for regulation of the expression of the 
particular nucleic acid. The vectors optionally comprise 
generic expression cassettes containing at least one 
independent terminator sequence, sequences permitting 
replication of the cassette in eukaryotes, or prokaryotes, or 
both, (e.g., shuttle vectors) and selection markers for both 
prokaryotic and eukaryotic systems. See, e.g., Sambrook and 
Ausbel (both supra) . 

1. Expresaion in Prokaryotes 

Prokaryotic systems for cloning and/or expressing 
engineered GFP or BFP proteins are available using E. coJi, 
Bacillus sp. and Salmonella (Palva, I. et al. (1983), Gene 
22:229-235; Mosbach, K. et ai . (1983), ATature 302:543-545. To 
obtain high level expression in a prokaryotic system of a 
cloned nucleic acid such as those encoding engineered GFPs or 
BFPs, it is essential to construct expression vectors which 
contain, at a minimum, a strong promoter to direct 
transcription, a ribosome binding site for translational 
initiation, a transcription/translation terminator, a 
bacterial replicon, a nucleic acid encoding antibiotic 
resistance to permit selection of bacteria that harbor 
recombinant plasmids, and unique restriction sites in 



wo 97/42320 



PCTAJS97/07625 



18 

nonessential regions of the plasmid to allow insertion of 
foreign nucleic acids. The particular antibiotic resistance 
gene chosen is not critical, any of the many resistance genes 
known in the art are suitable. Examples of regulatory regions 
5 suitable for this purpose in E. coli are the promoter and 

operator region of the E, coli tryptophan biosynthetic pathway 
as described by Yanofsky, C. (1984), J. Bacteriol . , 
158:1018-1024, and the leftward promoter of phage lambda (Pl) 
as described by Herskowitz, I. and Hagen, D. (1980), Ann. Rev. 

10 Genet., 14:399-445 (1980). 

The particular vector used to transport the genetic 
information into the cell is not particularly critical. Any 
of the conventional vectors used for replication, cloning 
and/cr expression in prokaryotic cells may be used. 

15 The foreign nucleic acid can be incorporated into a 

nonessential region of the host cell's chromosome. This is 
achieved by first inserting the nucleic acid into a vector 
such that it is flanked by regions of DNA homologous to the 
insertion site in the host chromosome. After introduction of 

20 the vector into a host cell, the foreign nucleic acid is 

incorporated into the chromosome by homologous recombination 
between the flanking sequences and chromosomal DNA. 

Detection of the expressed protein is achieved by 
methods known in the art as radioimmunoassays, or Western 

25 blotting techniques or immunoprecipitation . Purification from 
E. coll can be achieved following procedures described in U.S. 
Patent No. 4,511,503. 

2. Expression in Eukaryotee 

30 Standard eukaryotic transfection methods are used to 

produce mammalian, yeast or insect cell lines which express 
large quantities of engineered GFP or BFP protein which are 
then purified using standard techniques. See, e.g., Colley et 
ai. (1989), J. Bioi. Chem. 264:17619-17622, and Guide to 

35 Protein Purification, in Vol. 182 of Methods in Enzymology 
(Deutscher ed. , 1990), D.A. Morrison (1977), J, Bact., 
132:349-351, or by J.E. Clark-Curtiss and R. Curtiss (1983), 
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Methods in Enzyiaology 101:347-362, Eds. R. Wu et al., 
Academic Press, New York. 

The particular eukaryotic expression vector used to 
transport the genetic information into the cell is not 
particularly critical. Any of the conventional vectors used 
for expression in eukaryotic cells may be used. Expression 
vectors containing regulatory elements from eukaryotic viruses 
such as retroviruses are typically used. SV4 0 vectors include 
pSVT7 and pMT2. Vectors derived from bovine papilloma virus 
include pBV-lMTHA, and vectors derived from Epstein Barr virus 
include pHEBO. and p205. other exemplary vectors include 
pMSG, PAV009/A*, pMTOlO/A*, pMAMneo-5, baculovirus pDSVE, and 
any other vector allowing expression of proteins under the 
direction of the SV-4 0 early promoter. SV-4 0 later promoter, 
metallothionein promoter, murine mammary tumor virus promoter, 
Rous sarcoma virus promoter, polyhedrin promoter, or other 
promoters shown effective for expression in eukaryotic cells. 

The expression vector typically comprises a 
eukaryotic transcription unit or expression cassette that 
contains all the elements required for the expression of the 
engineered GFP or BFP DNA in eukaryotic cells. A typical 
expression cassette contains a promoter operably linked to the 
DNA sequence encoding a engineered GFP or BFP protein and 
signals required for efficient polyadenylation of the 
transcript . 

Eukaryotic promoters typically contain two types of 
recognition sequences, the TATA box and upstream promoter 
elements. The TATA box. located 25-30 base pairs upstream of 
the transcription initiation site, is thought to be involved 
in directing RNA polymerase to begin RNA synthesis. The other 
upstream promoter elements determine the rate at which 
transcription is initiated. 

Enhancer elements can stimulate transcription up to 
1.000 fold from linked homologous or heterologous promoters. 
Enhancers are active when placed downstream or upstream from 
the transcription initiation site. Many enhancer elements 
derived from viruses have a broad host range and are active in 
a variety of tissues. For example, the SV40 early gene 
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enhancer is suitable for many cell types. Other 
enhancer /promoter combinations that are suitable for the 
present invention include those derived from polyoma virus, 
human or murine cytomegalovirus, the long term repeat from 
5 various retroviruses such as murine leukemia virus, murine or 
Rous sarcoma virus and HIV. See, Enhancers and Eukaryotic 
Expression, Cold Spring Harbor Press, Cold Spring Harbor, N.Y. 
1983, which is incorporated herein by reference. 

In the construction of the expression cassette, the 

10 promoter is preferably positioned about the same distance from 
the heterologous transcription start site as it is from the 
transcription start site in its natural setting. As is known 
in the art. however, some variation in this distance can be 
accommodated without loss of promoter function. 

15 In addition to a promoter sequence, the expression 

cassette should also contain a transcription termination • 
region downstream of the structural gene to provide for 
efficient termination. The termination region may be obtained 
from the same gene as the promoter sequence or may be obtained 

20 from different genes. 

If the mRNA encoded by the structural gene is to be 
efficiently translated, polyadenylation sequences are also 
commonly added to the vector construct. Two distinct sequence 
elements are required for accurate and efficient 

25 polyadenylation: GU or U rich sequences located downstream 

from the polyadenylation site and a highly conserved sequence 
of six nucleotides, AAUAAA, located 11-30 nucleotides 
upstream. Termination and polyadenylation signals that are 
suitable for the present invention include those derived from 

30 SV40, or a partial genomic copy of a gene already resident on 
the expression vector. 

In addition to the elements already described, the 
expression vector of the present invention may typically 
contain other specialized elements intended to increase the 

35 level of expression of cloned nucleic acids or to facilitate 
the identification of cells that carry the transfected DNA. 
For instance, a number of animal viruses contain DNA sequences 
that promote the extra chromosomal replication of the viral 
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genome in permissive cell types. Plasmids bearing these viral 
replicons are replicated episomally as long as the appropriate 
factors are provided by genes either carried on the plasmid or 
with the genome of the host cell. 

The DNA sequence encoding the engineered GFP or BFP 
protein may typically be linked to a cleavable signal peptide 
sequence to promote secretion of the encoded protein by the 
transformed cell. Such signal peptides would include, among 
others, the signal peptides from tissue plasminogen activator, 
insulin, neuron growth factor, and juvenile hormone esterase 
of Heliothis v^irescens. Additional elements of the cassette 
may include enhancers and, if genomic DNA is used as the 
structural gene, introns with functional splice donor and 
acceptor sites. 

The vector may or may not comprise a eukaryotic 
replicon. If a eukaryotic replicon is present, then the 
vector is amplifiable in eukaryotic cells using the 
appropriate selectable marker. If the vector does not 
comprise a eukaryotic replicon, no episomal amplification is 
possible. Instead, the transfected DNA integrates into the 
genome of the transfected cell, where the promoter directs 
expression of the desired nucleic acid. 

The vectors usually comprise selectable markers 
which result in nucleic acid amplification such as the sodium, 
potassium ATPase, thymidine kinase, aminoglycoside 
phosphotransferase, hygromycin B phosphotransferase, 
xanthine -guanine phosphoribosyl transferase, CAD (carbamyl 
phosphate synthetase, aspartate transcarbamylase , and 
dihydroorotase) , adenosine deaminase, dihydrofolate reductase, 
and asparagine synthetase and ouabain selection. 
Alternatively, high yield expression systems not involving 
nucleic acid amplification are also suitable, such as using a 
bacculovirus vector in insect cells, with the engineered GFP 
or BFP encoding sequence under the direction of the polyhedrin 
promoter or other strong baculovirus promoters. 

The expression vectors of the present invention will 
typically contain both prokaryotic sequences that facilitate 
the cloning of the vector in bacteria as well as one or more 
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eukaryotic transcription units that are expressed only in 
eukaryotic cells, such as mammalian cells. The prokaryotic 
sequences are preferably chosen such that they do not 
interfere with the replication of the DNA in eukaryotic cells. 

Any of the well known procedures for introducing 
foreign nucleotide sequences into host cells may be used. 
These include the use of calcium phosphate transfection, 
polybrene, protoplast fusion, electroporation, liposomes, 
microinjection, plasma vectors, viral vectors and any of the 
other well known methods for introducing cloned genomic DNA, 
cDNA, synthetic DNA or other foreign nucleic acidic material 
into a host cell (see Sambrook et ai . , supra). It is only 
necessary that the particular genetic engineering procedure 
utilized be capable of successfully introducing at least one 
nucleic acid into the host cell which is capable of expressing 
the engineered GFP or BFP protein. 

3. Expression in insect cells 

The baculovirus expression vector utilizes the 
highly expressed and regulated AuCographa calif ornica nuclear 
polyhedrosis virus (AcMNPV) polyhedrin promoter modified for 
the insertion of foreign nucleic acids. Synthesis of 
polyhedrin protein results in the formation of occlusion 
bodies in the infected insect cell. The baculovirus vector 
utilizes many of the protein modification, processing, and 
transport systems that occur in higher eukaryotic cells. The 
recombinant eukaryotic proteins expressed using this vector 
have been found in many cases to be, antigenically, 
immunogenically, and functionally similar to their natural 
counterparts. 

Briefly, a DNA sequence encoding an engineered GFP 
or BFP is inserted into a transfer plasmid vector in the 
proper orientation downstream from the polyhedrin promoter, 
and flanked on both ends with baculovirus sequences. Cultured 
insect cells, commonly Spodoptera frugiperda cells, are 
transfected with a mixture of viral and plasmid DNAs. The 
virus that develop, some of which are recombinant virus that 
result from homologous recombination between the two DNAs, are 
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plated at 100-1000 plaques per plate. The plaques containing 
recombinant virus can be identified visually because of their 
ability to form occlusion bodies or by DNA hybridization. The 
recombinant virus is isolated by plague purification. The 
resulting recombinant virus, capable of expressing engineered 
GFP or BFP, is self -propagating in that no helper virus is 
required for maintenance or replication. After infecting an 
insect culture with recombinant virus, one can expect to find 
recombinant protein within 48-72 hours. The infection is 
essentially lytic within 4-5 days. 

There are a variety of transfer vectors into which 
the engineered GFP or BFP nucleic acid can be inserted. For a 
summary of transfer vectors see Luckow, V.A. and Summers, M.D. 
(1988), Bio/Teci2nology 6:47-55. Preferred is the transfer 
vector pAcUW21 described by Bishop, D.H.L. (1992) in Seminars 
in Virology 3:253-264. 



4, Retroviral Vectors 

Retroviral vectors are particularly useful for 
modifying eukaryotic cells because of the high efficiency with 
which the retroviral vectors transduce target cells and 
integrate into the target cell genome. Additionally, the 
retroviruses harboring the retoviral vector are capable of 
infecting cells from a wide variety of tissues. 

Retroviral vectors are produced by genetically 
manipulating retroviruses. Retroviruses are RNA viruses 
because the viral genome is RNA. Upon infection, this genomic 
RNA is reverse transcribed into a DNA copy which is integrated 
into the chromosomal DNA of transduced cells with a high 
degree of stability and efficiency. The integrated DNA copy 
is referred to as a provirus and is inherited by daughter 
cells as is any other gene. The wild type retroviral genome 
and the proviral DNA have three genes: the gag, the poi and 
the env genes, which are flanked by two long terminal repeat 
(LTR) sequences. The gag gene encodes the internal structural 
(nucleocapsid) proteins; the pol gene encodes the RNA directed 
DNA polymerase (reverse transcriptase); and the env gene 
encodes viral envelope glycoproteins. The 5* and 3' LTRs 
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serve to promote transcription and polyadenylation of virion 
RNAs. Adjacent to the 5' LTR are sequences necessary for 
reverse transcription of the genome (the tRNA primer binding 
site) and for efficient encapsulation of viral RNA into 
5 particles (the Psi site). See Mulligan. R.C. (1983), In: 

Experimental Manipulation of Gene Expression, M. Inouye (ed) , 
155-173; Mann. R. etai. (1983), Cell, 33:153-159; Cone, R.D. 
and R.C. Mulligan (1984), Proceedings of the National Academy 
of Sciences, U.S. A, 81 ; 6349-6353 . 

10 The design of retroviral vectors is well known to 

one of skill in the art. See Singer, M. and Berg, P. supra. 
In brief, if the sequences necessary for encapsidation (or 
packaging of retroviral RNA into infectious virions) are 
missing from the viral genome, the result is a cis acting 

15 defect which prevents encapsidation of genomic RNA. However, 
the resulting mutant is still capable of directing the 
synthesis of all virion proteins. Retroviral genomes from 
which these sequences have been deleted, as well as cell lines 
containing the mutant genome stably integrated into the 

20 chromosome are well known in the art and are used to construct 
retroviral vectors. Preparation of retroviral vectors and 
their uses are described in many publications including 
European Patent Application EPA 0 178 220, U.S. Patent 
4,405,712, Gilboa (1986), Biotechnigues 4 : 504 - 512 , Mann, et 

25 ai. (1983), Cell 33:153-159, Cone and Mulligan (1984), Proc . 

Natl. Acad. Sci. USA 81:634 9-63 53, Eglitis, M.A, et al . (1988) 
Biotechnigues 6:608-614, Miller, A.D, et al . (1989) 
Biotechnigues 7:981-990, Miller, A.D. (1992) Nature, supra. 
Mulligan, R.C. (1993), supra, and Gould, B. et al . , and 

30 International Patent Application No. WO 92/07943 entitled 

"Retroviral Vectors Useful in Gene Therapy." The teachings of 
these patents and publications are incorporated herein by 
reference . 

The retroviral vector particles are prepared by 
35 recombinant ly inserting the nucleic acid encoding engineered 
GFP or BFP into a retrovirus vector and packaging the vector 
with retroviral capsid proteins by use of a packaging cell 
line. The resultant retroviral vector particle is incapable 
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of replication in the host cell and is capable of integrating 
into the host cell genome as a proviral sequence containing 
the engineered GFP or BFP nucleic acid. As a result, the 
patient is capable of producing engineered GFP or BFP and 
metabolize glycogen to completion. 

Packaging cell lines are used to prepare the 
retroviral vector particles. A packaging cell line is a 
genetically constructed mammalian tissue culture cell line 
that produces the necessary viral structural proteins required 
for packaging, but which is incapable of producing infectious 
virions. Retroviral vectors, on the other hand, lack the 
structural genes but have the nucleic acid sequences necessary 
for packaging. To prepare a packaging cell line, an 
infectious clone of a desired retrovirus, in which the 
packaging site has been deleted, is constructed. Cells 
comprising this construct will express all structural proteins 
but the introduced DNA will be incapable of being packaged. 
Alternatively, packaging cell lines can be produced by 
•transforming a cell line with one or more expression plasmids 
encoding the appropriate core and envelope proteins. In these 
cells, the grag, pol, and env genes can be derived from the 
same or different retroviruses. 

A number of packaging cell lines suitable for the 
present invention are available in the prior art. Examples of 
these cell lines include Crip, GPE86, PA317 and PG13 . See 
Miller ec al . (1991), J. Virol. 65:2220-2224, which is 
incorporated herein by reference. Examples of other packaging 
cell lines are described in Cone, R. and Mulligan, R.C. 
(1984), Proceedings of the National Academy of Sciences, 
U.S.A., 81:6349-6353 and in Danos, O. and R.C. Mulligan 
(1988), Proceedings of the National Academy of Sciences, 
U.S.A., 85:6460-6464, Eglitis, M.A, et aJ . (1988) 
Biotechniques 6:608-614, also all incorporated herein by 
reference. 

Packaging cell lines capable of producing retroviral 
vector particles with chimeric envelope proteins may be used. 
Alternatively, amphotropic or xenotropic envelope proteins, 
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such as those produced by PA317 and GPX packaging cell lines 
may be used to package the retroviral vectors. 

Transforming cells with nucleic acids can involve, 
for example, incubating the cells with viral vectors (e.g., 
5 retroviral or adeno-associated viral vectors) containing with 
cells within the host range of the vector. See, e.g., Methods 
in Enzymology, Vol. 185. Academic Press, Inc., San Diego, CA 
(D.V. Goeddel, ed.) (1990) or M. Krieger (1990), Gene Transfer 
and Expression -- A Laboratory Manual, Stockton Press, New 
10 York, NY, and the references cited therein. 

5. Transformation with adeno-aasociated virus 
Adeno associated viruses (AAVs) require helper 
viruses such as adenovirus or herpes virus to achieve 

15 productive infection. In the absence of helper virus 

functions, AAV integrates (site-specifically) into, a host 
cell's genome, but the integrated AAV genome has no pathogenic 
effect. The integration step allows the AAV genome to remain 
genetically intact until the host is exposed to the 

20 appropriate environmental conditions (e.g., a lytic helper 

virus), whereupon it re-enters the lytic life-cycle. Samulski 
(1993), Current Opinion in Genetic and Development 3:74-80 and 
the references cited therein provides an overview of the AAV 
life cycle. 

25 AAV- based vectors are used to transduce cells with 

target nucleic acids, e.g., in the in vitro production of 
nucleic acids and peptides, and in in vivo and ex vivo gene 
therapy procedures. See, West et aJ . (1987), Virology 160:38- 
47; Carter et ai , (198?) U.S. Patent No. 4,797,368; Carter et 

30 al. (1993), WO 93/24641; Kotin (1994), Human Gene Therapy 
5:793-801; Muzyczka (1994), J". Clin. Invest, 94:1351 and 
Samulski (supra) for an overview of AAV vectors. 

Recombinant AAV vectors (rAAV vectors) deliver 
foreign nucleic acids to a wide range of mammalian cells 

35 (Hermonat & Muzycka (1984), Proc, Natl. Acad, Sci . USA 
81:6466-6470; Tratschin et al . (1985), Mol , Cell Biol ^ 
5:3251-3260), integrate into the host chromosome (Mclaughlin 
et al. (1988), J. Virol, 62:1963-1973), and show stable 
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expression of the transgene in cell and animal models (Flotte 
et al. (1993), Proc, Natl. Acad. Sci . USA 90:10613-10617), 
Moreover, unlike some retroviral vectors, rAAV vectors are 
able to infect non-dividing cells (Podsakoff et al. (1994), J. 
Virol. 68:5656-66; Flotte et al . (1994), Am. J. Respir. Cell 
Mol. Biol. 11:517-521). Further advantages of rAAV vectors 
include the lack of an intrinsic strong promoter, thus 
avoiding possible activation of downstream cellular sequences, 
and their naked eicosahedral capsid structure, which renders 
them stable and easy to concentrate by common laboratory 
techniques. rAAV vectors are used to inhibit, e.g., viral 
infection, by including anti-viral transcription cassettes in 
the rAAV vector which comprise an inhibitor of the invention. 

6. Expression in recombinant vaccinia virus- 
infected cells 

The nucleic acid encoding engineered GFP or BFP is 

inserted into a plasmid designed for producing recombinant 

vaccinia, such as pGS62, Langford, C.L. et al. (1986), Mol. 

Cell. Biol. 6:3191-3199. This plasmid consists of a cloning 

site for insertion of foreign nucleic acids, the P7.5 promoter 

of vaccinia to direct synthesis of the inserted nucleic acid, 

and the vaccinia TK gene flanking both ends of the foreign 

nucleic acid. 

When the plasmid containing the engineered GFP or 
BFP nucleic acid is constructed, the nucleic acid can be 
transferred to vaccinia virus by homologous recombination in 
the infected cell. To achieve this, suitable recipient cells 
are transfected with the recombinant plasmid by standard 
calcium phosphate precipitation techniques into cells already 
infected with the desirable strain of vaccinia virus, such as 
Wyeth, Lister, WR or Copenhagen. Homologous recombination 
occurs between the TK gene in the virus and the flanking TK 
gene sequences in the plasmid. This results in a recombinant 
virus with the foreign nucleic acid inserted into the viral TK 
gene, thus rendering the TK gene inactive. Cells containing 
recombinant viruses are selected by adding medium containing 
5-bromodeoxyuridine, which is lethal for cells expressing a TK 
gene . 
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Confirmation of production of recombinant virus is 
achieved by DNA hybridization using cDNA encoding the 
engineered GFP or BFP and by immunodetection techniques using 
antibodies specific for the expressed protein. Virus stocks 
5 may be prepared by infection of cells such as HeLA S3 spinner 
cells and harvesting of virus progeny. 

7. Expression in cell cultures 

GFP- or BFP-encoding nucleic acids can be ligated to 

10 various expression vectors for use in transforming host cell 
cultures. The culture of cells used in conjunction with the 
present invention is well known in the art. Freshney (1994) 
(Culture of Animal Cells, a Manual of Basic Technique, third 
edition Wiley-Liss, New York), Kuchler et ai . (1977) 

15 Biochemical Methods in Cell Culture and Virology, Kuchler, 
Dowden, Hutchinson and Ross, Inc., and the references 
cited therein provides a general guide to the culture of 
cells. Illustrative cell cultures useful for the production 
of recombinant proteins include cells of insect or mammalian 

20 origin. Mammalian cell systems often will be in the form of 
monolayers of cells, although mammalian cell suspensions are 
also used. Illustrative examples of mammalian cell lines 
include monocytes, lymphocytes, macrophage, VERO and HeLa 
cells, Chinese hamster ovary (CHO) cell lines, W138, BHK, 

25 Cos-7 or MDCK cell lines (see, e.g., Freshney, supra). 

Cells of mammalian origin are illustrative of cell 
cultures useful for the production of the engineered GFP or 
BFP . Mammalian cell systems often will be in the form of 
monolayers of cells although mammalian cell suspensions may 

30 also be used. Illustrative examples of mammalian cell lines 

include VERO and HeLa cells, Chinese hamster ovary (CHO) cell 
lines, WI38, BHK, COS-7 or MDCK cell lines. 

As indicated above, the vector, e.g., a plasmid, 
which is used to transform the host cell, preferably contains 

35 DNA sequences to initiate transcription and sequences to 

control the translation of the engineered GFP or BFP nucleic 
acid sequence. These sequences are referred to as expression 
control sequences. Illustrative expression control sequences 
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are obtained from the SV-40 promoter {Science 222:524-527, 
(1983)), the CMV i.e. Promoter {Proc. Natl. Acad, Sci. 
81:659-663, (1984)) or the metallothionein promoter (Wature 
296:39-42, (1982)). The cloning vector containing the 
expression control sequences is cleaved using restriction 
enzymes and adjusted in size as necessary or desirable and 
ligated with sequences encoding the engineered GFP or BFP 
protein by means well Jcnown in the art. 

The vectors for transforming cells in culture 
typically contain gene sequences to initiate transcription and 
translation of the engineered GFP or BFP gene. These 
sequences need to be compatible with the selected host cell. 
In addition, the vectors preferably contain a marlcer to 
provide a phenotypic trait for selection of transformed host 
cells such as dihydrofolate reductase or metallothionein. 
Additionally, a vector might contain a replicative origin. 

As mentioned above, when higher animal host cells 
are employed, polyadenlyation or transcription terminator 
sequences from known mammalian genes need to be incorporated 
into the vector. An example of a terminator sequence is the 
polyadenylation sequence from the bovine growth hormone gene. 
Sequences for accurate splicing of the transcript may also be 
included. An example of a splicing sequence is the VPi intron 
from SV4 0 (Sprague, J. etai. (1983), J". Virol. 45: 773-781), 
Additionally gene sequences to control replication 
in the host cell may be incorporated into the vector such as 
those found in bovine papilloma virus type- vectors . 
Saveria-Campo, M. (1985), "Bovine Papilloma virus DNA a 
Eukaryotic Cloning Vector" in DNA Cloning Vol.11 a Practical 
Approach Ed. D.M. Glover, IRL Press, Arlington, Virginia pp. 
213-238. 

The transformed cells are cultured by means well 
Icnown in the art. For example, as published in Kuchler, R.J. 
et ai., (1977), Biochemical Methods in Cell Culture and 
Virology. 

In addition to the above general procedures which 
can be used for preparing recombinant DNA molecules and 
transformed unicellular organisms in accordance with the 
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practices of this invention, other known techniques and 
modifications thereof can be used in carrying out the practice 
of the invention. Any known system for expression of isolated 
genes is suitable for use in the present invention. For 
5 example, viral expression systems such as the bacculovirus 
expression system are specifically contemplated within the 
scope of the invention. Many recent U.S. patents disclose 
plasmids, genetically engineering microorganisms, and methods 
of conducting genetic engineering which can be used in the 

10 practice of the present invention. For example, U.S. Pat. No. 
4,273,875 discloses a plasmid and a process of isolating the 
same. U.S. Pat. No. 4,304,863 discloses a process for 
producing bacteria by genetic engineering in which a hybrid 
plasmid is constructed and used to transform a bacterial host. 

15 U.S. Pat. No. 4,419,450 discloses a plasmid useful as a 
cloning vehicle in recombinant DNA work. U.S. Pat. No. 
4,362,867 discloses recombinant cDNA construction methods and 
hybrid nucleotides produced thereby which are useful in 
cloning processes. U.S. Pat. No. 4,403,036 discloses genetic 

20 reagents for generating plasmids containing multiple copies of 
DNA segments. U.S. Pat. No. 4,363,877 discloses recombinant 
DNA transfer vectors. U.S. Pat. No, 4,356,270 discloses a 
recombinant DNA cloning vehicle and is a particularly useful 
disclosure for those with limited experience in the area of 

25 genetic engineering since it defines many of the terms used in 
genetic engineering and the basic processes used therein, 
U.S. Pat. No. 4,336,336 discloses a fused gene and a method of 
making the same. U.S. Pat. No. 4,319,629 discloses plasmid 
vectors and the production and use thereof. U.S. Pat. No. 

30 4,332,901 discloses a cloning vector useful in recombinant 
DNA. Although some of these patents are directed to the 
production of a particular gene product that is not within the 
scope of the present invention, the procedures described 
therein can easily be modified to the practice of the 

35 invention described in this specification by those skilled in 
the art of genetic engineering. Transferring the isolated GFP 
cDNA to other expression vectors will produce constructs which 
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improve the expression of the GFP polypeptide in E. coli or 
express GFP in other hosts. 

Detection of GFP and BFP Nue leic Aeida and Proteina 

A. General detection methods 

The nucleic acids and proteins of. the invention are 
detected, confirmed and quantified by any of a number of means 
well known to those of skill in the art. The unique quality 
of the inventive expressed proteins here is that they provide 
an enhanced fluorescence which can be readily and easily 
observed. Fluorescence assays for the expressed proteins are 
described in detail below, other general methods for 
detecting both nucleic acids and corresponding proteins 
include analytic biochemical methods such as 
spectrophotometry, radiography, electrophoresis, capillary 
electrophoresis, high performance liquid chromatography 
(HPLC) , thin layer chromatography (TLC) . hyperdif fusion 
chromatography, and the like, and various immunological 
methods such as fluid or gel precipitin reactions, 
immunodiffusion (single or double) , immunoelectrophoresis, 
radioimmunoassays (RIAs) , enzyme- linked immunosorbent assays 
(ELISAs) , immunof luorescent assays, and the like. The 
detection of nucleic acids proceeds by well known methods such 
as Southern analysis, northern analysis, gel electrophoresis. 
PCR, radiolabeling. scintillation counting, and affinity 
chromatography . 

A variety of methods of specific DNA and RNA 
measurement using nucleic acid hybridization techniques are 
known to those of skill in the art. For example, one method 
for evaluating the presence or absence of engineered GFP or 
BFP DNA in a sample involves a Southern transfer. Southern et 
al. (1975), J. Mol. Biol. 98:503. Briefly, the digested 
genomic DNA is run on agarose slab gels in buffer and 
transferred to membranes. Hybridization is carried out using 
the probes discussed above. Visualization of the hybridized 
portions allows the qualitative determination of the presence 
or absence of engineered GFP or BFP genes. 
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Similarly, a Northern transfer may be used for the 
detection of engineered GFP or BFP mRNA in samples of RNA from 
cells expressing the engineered GFP or BFP gene. In brief, 
the mRNA is isolated from a given cell sample using an acid 
5 guanidinium-phenol -chloroform extraction method. The mRNA is 
then electrophoresed to separate the mRNA species and the mRNA 
is transferred from the gel to a nitrocellulose membrane. As 
with the Southern blots, labeled probes are used to identify 
the presence or absence of the engineered GFP or BFP 

10 transcript . 

The selection of a nucleic acid hybridization format 
is not critical. A variety of nucleic acid hybridization 
formats are known to those skilled in the art. For example, 
common formats include sandwich assays and competition or 

15 displacement assays. Hybridization techniques are generally 
described in "Nucleic Acid Hybridization, A Practical 
Approach, " Ed. Hames, B.D. and Higgins, S.J., IRL Press, 1985; 
Gall and Pardue {1969), Proc. Natl. Acad, Sci , USA 63:378-383; 
and John, Burnsteil and Jones (1969), Nature 223:582-587. 

20 For example, sandwich assays are commercially useful 

hybridization assays for detecting or isolating nucleic acid 
sequences. Such assays utilize a "capture" nucleic acid 
covalently immobilized to a solid support and labelled 
"signal" nucleic acid in solution. The clinical sample will 

25 provide the target nucleic acid. The "capture" nucleic acid 
and "signal" nucleic acid probe hybridize with the target 
nucleic acid to form a "sandwich" hybridization complex. To 
be effective, the signal nucleic acid cannot hybridize with 
the capture nucleic acid. 

30 The nucleic acid sequences used in this invention 

can be either positive or negative probes. Positive probes 
^ bind to their targets and the presence of duplex formation is 
evidence of the presence of the target. Negative probes fail 
to bind to the suspect target and the absence of duplex 

35 formation is evidence of the presence of the target. For 

example, the use of a wild type specific nucleic acid probe or 
PGR primers may act as a negative probe in an assay sample 
where only the mutant engineered GFP or BFP is present. 
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Labelled signal nucleic acids, whether those 
described herein or others known in the art are used to detect 
hybridization. Complementary nucleic acids or signal nucleic 
acids may be labelled by any one of several methods typically 
used to detect the presence of hybridized polynucleotides. 
One common method of detection is the use of autoradiography 
with ^H, ^25j^ 35s^ 14^^ ^2p.i3^^;|^;Led probes or the like. 
Other labels include ligands which bind to labelled 
antibodies, f luorophores, chemiluminescent agents, enzymes, 
and antibodies which can serve as specific binding pair 
members for a labelled ligand. 

Detection of a hybridization complex may require the 
binding of a signal generating complex to a duplex of target 
and probe polynucleotides or nucleic acids. Typically, such 
binding occurs through ligand and anti -ligand interactions as 
between a ligand -conjugated probe and an anti -ligand 
conjugated with a signal. The binding of the signal 
generation complex is also readily amenable to accelerations 
by exposure to ultrasonic energy. 

The label may also allow indirect detection of the 
hybridization complex. For example, where the label is a 
hapten or antigen, the sample can be detected by using 
antibodies. In these systems, a signal is generated by 
attaching fluorescent or enzyme molecules to the antibodies or 
in some cases, by attachment to a radioactive label. 
(Tijssen, P. (1985), "Practice and Theory of Enzyme 
Immunoassays," Laboratory Techniques in Biochemistry and 
Molecular Biology, Burden, R.H., van Knippenberg, P.H., Eds., 
Elsevier, pp. 9-20.) 

The sensitivity of the hybridization assays may be 
enhanced through use of a nucleic acid amplification system 
which multiplies the target nucleic acid being detected. In 
vitro amplification techniques suitable for amplifying 
sequences for use as molecular probes or for generating 
nucleic acid fragments for subsequent subcloning are known. 
Examples of techniques sufficient to direct persons of skill 
through such in vitro amplification methods, including the 
polymerase chain reaction (PGR) the ligase chain reaction 
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(LCR) , Q/3-replicase amplification and other RNA polymerase 
mediated techniques (e.g., NASBA) are found in Berger, 
Sambrook, and Ausubel, as well as Mullis et ai. (1987), U.S. 
Patent No. 4,683,202; PCR Protocols A Guide to Methods and 
3 Applications (Innis et aJ., eds) Academic Press Inc. San 
Diego, CA (1990) (Innis); Arnheim & Levinson (October 1, 
1990), Chem. Eng. News 36-47; J. NIH Res. (1991) 3:81-94; 
(Kwoh et ai. (1989), Proc. Natl. Acad. Sci. USA 86:1173; 
Guatelli etal. (1990), Proc. Natl. Acad. Sci . USA 87:1814: 

10 Lomell et al. (1989), J. Clin. Chem. 35:1826; Landegren et al . 

(1988), Science 241:1077-1080; Van Brunt (1990), Biotechnology 
8:291-294; Wu and Wallace (1989), Gene 4:560; Barringer et ai , 
(1990), Gene 89:117, and Sooknanan and Malek (1995), 
Biotechnology 13:563-564. Improved methods of cloning 

15 in vitro amplified nucleic acids are described in Wallace et 
aJ., U.S. Pat. No. 5,426,039, Other methods recently 
described in the art are the nucleic acid sequence based 
amplification (NASBA™, Cangene, Mississauga, Ontario) and Q 
Beta Replicase systems. These systems can be used to directly 

20 identify mutants where the PCR or LCR primers are designed to 
be extended or ligated only when a select sequence is present. 
Alternatively, the select sequences can be generally amplified 
using, for example, nonspecific PCR primers and the amplified 
target region later probed for a specific sequence indicative 

25 of a mutation. 

Oligonucleotides for use as probes, e.g., in in 
vitro amplification methods, for use as gene probes, or as 
inhibitor components are typically synthesized chemically 
according to the solid phase phosphoramidite triester method 

30 described by Beaucage and Caruthers (1981) , Tetrahedron Letts. 
22 (20) : 1859-1862, e.g., using an automated synthesizer, as 
described in Needham-VanDevanter et ai. (1984) , Nucleic Acids 
Res. 12:6159-6168. Purification of oligonucleotides, where 
necessary, is typically performed by either native acrylamide 

35 gel electrophoresis or by anion-exchange HPLC as described in 
Pearson and Regnier (1983), J. Chrom. 255:137-149. The 
sequence of the synthetic oligonucleotides can be verified 
using the chemical degradation method of Maxam and Gilbert 
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(1980) in Grossman and Moldave (eds.) Academic Press, New 
York, Methods in Enzymology 65:499-560. 

An alternative means for determining the level of 
expression of the engineered GFP or BFP gene is in situ 
hybridization. In situ hybridization assays are well known 
and are generally described in Angerer et ai. (1987), Methods 
Enzymol, 152:649-660, In an in situ hybridization assay cells 
are fixed to a solid support, typically a glass slide. If DNA 
is to be probed, the cells are denatured with heat or alkali. 
The cells are. then contacted with a hybridization solution at 
a moderate temperature to permit annealing of engineered GFP 
or BFP specific probes that are labelled. The probes are 
preferably labelled with radioisotopes or fluorescent 
reporters . 

B. Fluorescence Assay 

When a fluorophore such as protein that is capable 
of fluorescing is exposed to a light of appropriate 
wavelength, it will absorb and store light and then release 
the stored light energy. The range of wavelengths that a 
fluorophore is capable of absorbing is the excitation spectrum 
and the range of wavelengths of light that a fluorophore is 
capable of emitting is the emission or fluorescence spectrum. 
The excitation and fluorescence spectra for a given 
fluorophore usually differ and may be readily measured using 
known instruments and methods. For example, scintillation 
counters and photometers (e.g. luminometers) , photographic 
film, and solid state devices such as charge coupled devices, 
may be used to detect and measure the emission of light. 

The nucleic acids, vectors, mutant proteins provided 
herein, in combination with well known techniques for over- 
expressing recombinant proteins, make it possible to obtain 
unlimited supplies of homogeneous mutant GFPs and BFPs . These 
modified GFPs or BFPs having increased fluorescent activity 
replace wtGTP or other currently employed tracers in existing 
diagnostic and assay systems. Such currently employed tracers 
include radioactive atoms or molecules and color-producing 
enzymes such as horseradish peroxidase. 
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The benefits of using the mutants of the present 
invention are at least four- fold: the modified GFPs and BFPs 
are safer than radioactive -based assays, modified GFPs and 
BFPs can be assayed quickly and easily, and large numbers of 
5 samples can be handled simultaneously, reducing overall 

handling and increasing efficiency. Of great significance, 
the expression and subcellular distribution of the fluorescent 
proteins within cells can be detected in living tissues 
without any other experimental manipulation than to placing 
10 the cells on a slide and viewing them through a fluorescence 
microscope. This represents a vast improvement over methods 
of immunodetection that require fixation and subsequent 
labelling. 

The modified GFPs and BFPs of the present invention 

15 can be used in standard assays involving a fluorescent marker. 
For example, ligand- ligator binding pairs that can be modified 
with the mutants of the present invention without disrupting 
the ability of each to bind to the other can form the basis of 
an assay encompassed by the present invention. These and 

20 other assays are known in the art and their use with the GFPs 
and BFPs of the present invention will become obvious to one 
skilled in the art in light of the teachings disclosed herein. 
Examples of such assays include competitive assays wherein 
labeled and unlabeled ligands competitively bind to a ligator, 

25 noncompetitive assay where a ligand is captured by a ligator 
and either measured directly or "sandwiched" with a secondary 
ligator that is labeled. Still other types of assays include 
immunoassays, single-step homogeneous assays, multiple-step 
heterogeneous assays, and enzyme assays. 

30 In a number of embodiments, the mutant GFPs and BFPs 

are combined with fluorescent microscopy using known 
techniques (see, e.g., Stauber et al., Virol. 213:439-454 
(1995)) or preferably with fluorescence activated cell sorting 
(FACS) to detect and optionally purify or clone cells that 

35 express specific recombinant constructs. For a brief overview 
of the FACS and its uses, see: Herzenberg et ai., 1976, 
"Fluorescence activated cell sorting", Sci • Amer. 234, 108; 
see also Flow Cytometry and Sorting, eds. Melamad, Mullaney and 
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Mendelsohn, John Wiley and Sons, Inc., New York, 1979). 
Briefly, fluorescence activated cell sorters take a suspension 
of cells and pass them single file into the light path of a 
laser placed near a detector. The laser usually has a set 
wavelength. The detector measures the fluorescent emission 
intensity of each cell as it passes through the instrument and 
generates a histogram plot of cell number versus fluorescent 
intensity. Gates or limits can be placed on the histogram 
thus identifying a particular population of cells. In one 
embodiment, the cell sorter is set up to select cells having 
the highest probe intensity, usually a small fraction of the 
cells in the culture, and to separate these selected cells 
away from all the other cells. The level of intensity at 
which the sorter is set and the fraction of cells which is 
selected, depend on the condition of the parent culture and 
the criteria of the isolation. In general, the operator 
should first sort an aliquot of the culture, and record the 
histogram of intensity versus number of cells. The operator 
can then set the selection level and isolate an appropriate 
number of the most active cells. Currently, fluorescence 
activated cell sorters are equipped with automated cell 
cloning devices. Such a device enables one to instruct the 
instrument to singly deposit a selected cell into an 
individual growth well, where it is allowed to grow into a 
monoclonal culture. Thus, genetic homogeneity is established 
within the newly cloned culture. 

IV. General Applications for the GPP Mutants 

It should be self-evident that the mutant GFP and 
BFP sequences described here have unlimited uses, particularly 
as signal or reporter sequences for the co- expression of other 
nucleic acid sequences of interest and/or to track the 
location and/or movement of other sequences within the cell, 
within tissue and the like. For example, these reporter type 
sequences could be used to track the spread (or lack thereof) 
of a disease causal agent in drug screening assays or could 
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readily be used in diagnostics. Some of the more interesting 
applications are described below. 



A. Protein Trafficking 

5 Normally, expressed mutant GFPs and BFPs are 

distributed throughout the cell (particularly mammalian 
cells), except for the nucleolus. However, as described 
below, when a GFP mutant is fused to the HIV-1 Rev protein, a 
hybrid molecule results which retains the Rev function and is 
10 localized mainly in the nucleolus where Rev is found. Fusion 
to the terminal domain of the HIV-1 Nef protein produces a 
hybrid protein detectable in the plasma membrane. Thus, the 
GFP mutants can be used to monitor the subcellular targeting 
and transport of proteins to which they are fused. 

15 

B« Gene Therapy 

The mutant GFPs described here have interesting and 
useful applications in gene therapy. Gene therapy in general 
is the correction of genetic defects by insertion of exogenous 

20 cellular genes that encode a desired function into cells that 
lack that function, such that the expression of the exogenous 
gene a) corrects a genetic defect or b) causes the destruction 
of cells that are genetically defective. Methods of gene 
therapy are well known in the art, see, for example, Lu, M. , 

25 ec ai.{1994), Human Gene Therapy 5:203; Smith, C. (1992), J. 

Hezna to therapy 1:155; Cassel, A., etai. (1993), Exp, Hematol. 
21-:585 (1993); Larrick, J.W. and Burck, K.L., GENE Therapy; 
Application of molecular Biology, Elsevier Science Publishing Co, , 
Inc., New York, New York (1991) and Kreigler, M. Gene Transfer 

30 AND EXPRESSION: A LABORATORY MANUAL, W.H. Freeman and Company, New 
York (1990), each incorporated herein by reference. One 
modality of gene therapy involves (a) obtaining from a patient 
a viable sample of primary cells of a particular cell type; 
(b) inserting into these primary cells a nucleic acid segment 

35 encoding a desired gene product; (c) identifying and isolating 
cells and cell lines that express the gene product; (d) re- 
introducing cells that express the gene product; (e) removing 
from the patient an aliquot of tissue including cells 
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resulting from seep c and their progeny; and (f) determining 
the quantity of the cells resulting from step c and their 
progeny, in said aliquot. The introduction into cells in step 
c of a polycistronic vector that encodes GFP or BFP in 
addition to the desired gene allows for the quick 
identification of viable cells that contain and express the 
desired gene. 

Another gene therapy modality involves inserting the 
desired nucleic acid into selected tissue cells in situ, for 
example into cancerous or diseased cells, by contacting the 
target cells in situ with retroviral vectors that encode the 
gene product in question. Here, it is important to quickly 
and reliably assess which and what proportion of cells have 
been transfected. Co-expression of GFP and BFP permits a 
quick assessment of proportion of cells that are transfected, 
and levels of expression. 

C. Diagnostics 

One potential application of the GFP/BFP variants is 
in diagnostic testing. The GFP/BFP gene, when placed under 
the control of promoters induced by various agents, can serve 
as an indicator for these agents. Established cell lines or 
cells and tissues from transgenic animals carrying GFP/BFP 
expressed under the desired promoter will become fluorescent 
in the presence of the inducing agent . 

Viral promoters which are transactivated by the 
corresponding virus, promoters of heat shock genes which are 
induced by various cellular stresses as well as promoters 
which are sensitive to organismal responses, e.g. 
inflammation, can be used in combination with the described 
GFP/BFP mutants in diagnostics. 

In addition, the effect of selected culture 
conditions and components (salt concentrations, pH, 
temperature, trans-acting regulatory substances, hormones, 
cell -cell contacts, ligands of cell surface and internal 
receptors) can be assessed by incubating cells in which 
sequences encoding the fluorescent proteins provided herein 
are operably linked to nucleic acids (especially regulatory 
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elements such as promoters) derived from a selected gene, and 
detecting the expression and location of fluoresence. 

D. Toxicology 

Another application of the GFP/BFP -based 
methodologies is in the area of toxicology. Assessment of the 
mutagenic potential of any compound is a prerequisite for its 
use. Until recently, the Ames assay in Salmonella aiid tests 
based on chromosomal aberrations or sister chromatid exchanges 
in cultured mammalian cells were the main tools in toxicology. 
However, both assays are of limited sensitivity and 
specificity and do not allow studies on mutation induction in 
various organs or tissues of the intact organism. 

The introduction of transgenic mice with a 
mutational target in a shuttle vector has made possible the 
detection of induced mutations in different tissues in vivo. 
The assay involves DNA isolation from tissues of exposed mice, 
packaging of the target DNA into bacteriophage lambda 
particles and subsequent infection of E. coii. The mutational 
target in this assay is either the lacZ or lad genes and 
quantitation of blue vs white plaques on the bacterial lawn 
allows for mutagenic assessment. 

GFP/BFP could significantly simplify both the tissue 
culture and transgenic mouse procedures. Expression of 
GFP/BFP under the control of a repressor, which in turn is 
driven by the promoter of a const itutively expressed gene, 
will establish a rapid method for evaluating the mutagenic 
potential of an agent. The presence of fluorescent cells, 
following exposure of a cell line, tissue or whole animal 
carrying the GFP/BFP-based detection construct, will reflect 
the mutagenicity of the compound in question. GFP/BFP 
expressed under the control of the target DNA, the repressor 
gene, will only be synthesized when the repressor is 
inactivated or turned off or the repressor recognition 
sequences are mutated. Direct visualization of the detector 
cell line or tissue biopsy can qualitatively assess the 
mutagenicity of the agent, while FACS of the dissociated cells 
can provide for quantitative analysis. 



wo 97/42320 



PCTAJS97/07625 



41 

£• Drug Screening 

The GFP/BFP detection system could also 
significantly expedite and reduce the cost of some current 
drug screening procedures. A dual color screening system 
(DCSS) , in which GFP is placed under the promoter of a target 
gene and BFP is expressed from a constitutive promoter, could 
provide for rapid analysis of agents that specifically affect 
the target gene. Established cell lines with the DCSS could 
be screened with hundreds of compounds in few hours. The 
desired drug will only influence the expression of GFP. 
Non-specific or cytotoxic effects will be detected by the 
second marker, BFP. The advantages of this system are that no 
exogenous substances are required for GFP and BFP detection, 
the assay can be used with single cells, cell populations, or 
cell extracts, and that the same detection technology and 
instrumentation is used for very rapid and non-destructive 
detection. 

The search for antiviral agents which specifically 
block viral transcription without affecting cellular 
transcription, could be significantly improved by the DCSS. 
In the case of HIV, appropriate cell lines expressing GFP 
under the HIV LTR and BFP under a cellular constitutive 
promoter, could identify compounds which selectively inhibit 
HIV transcription. Reduction of only the green but not the 
blue fluorescent signal will indicate drug specificity for the 
HIV promoter. Similar approaches could also be designed for 
other viruses . 

Furthermore, the search for antiparasitic agents 
could also be helped by the DCSS. Established cell lines or 
transgenic nematodes or even parasitic extracts where 
expression of GFP depends on parasite-specific trans splicing 
sequences while BFP is under the control of host -specific cis 
splicing elements, could provide for rapid screen of selective 
antiparasitic drugs. 

The invention will be more readily understood by 
reference to the following specific examples which are 
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included for purposes of illustration only and are not 
intended to limit the invention unless so stated. 

EXAMPLES 

The following general protocol was used to generate 
mutant GFP- or BFP-encoding nucleic acids, transform host 
cells, and express the mutant GFP and BFP proteins: 

• Clone a nucleic acid that encodes either wtGFP or 
BFP (Tyrg7-»His) , under the control of eukaryotic or 
prokaryotic promoters, into a standard ds-DNA plasmid 

• Convert the plasmid vector to a ss-DNA by standard 
methods 

• Anneal the ss-DNA to 40-50 nucleotide DNA oligomers 
having base mismatches at the site(s) intended to be 
engineered 

• Convert the ss-DNA to a closed ds-DNA plasmid vector by 

use of DNA polymerase and standard protocols 

• Identify plasmids containing the desired mutations by 
restriction analysis following plasmid DNA isolation from 
E. coii strains transformed with the mutagenized DNA 

• verify the presence of mutations by DNA secpiencing 

• transfect human transformed embryonic kidney 293 cells 
with equal amounts of DNA from the appropriate plasmids 

• compare the fluorescence intensity of the signals 

Nucleic acida and vectors 

The WtGFP cDNA (SEQ ID N0:1) was obtained from Dr. 
Chalfie of Columbia University. All mutants described were 
obtained by modifying this wtGFP sequence as detailed below. 

The vectors used to clone and to express the GFPs 
and BFPs are derivatives of the commercially available 
plasmids pcDNA3 (Invitrogen, San Diego, CA) , pBSSK+ 
(Stratagene, La Jolla, CA) and pETlla (Novagen, Madison, WI) . 
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wtGFP protein expreesion in mammalian cells 

Several vectors for the expression of GFP in 
mammalian cells were constructed: 

pFRED4 carries the wtGFP sequences under the control of the 
cytomegalovirus (CMV) early promoter and the polyadenylation 
signal of the Human Immunodeficiency Virus-1 (HIV) 3* Long 
Terminal Repeat (LTR) . To derive pFRED4 we amplified the GFP 
coding sequence from plasmid ttTU58 (Chalfie et ai , , 1994) by 
the polymerase chain reaction (PGR) . For PGR amplification of 
the GFP coding region, oligonucleotides #16417 and #16418 were 
used as primers. Oligonucleotide #16417: 
5« -GGAGGCGCGCAAGAAATGGCTAGCAAAGGAGAAGA-3 • (SEQ ID N0:3) , 
containing the BssHII recognition sequence and the translation 
initiation sequence of the HIV-l Tat protein, was the sense 
primer. The antisense primer, #16418: 

5 • -GCGGGATCCTTATTTGTATAGTTCATCCATGCCATG- 3 ' (SEQ ID NO : 4 ) 
contained the BamHI recognition sequence. The amplified 
fragment was digested with BssHII and BamHI and cloned into 
BssHII and BamHI digested pCMV37Ml-10D, a plasmid containing 
the CMV early promoter and the HIV-l p37gag region, followed 
by several cloning sites and the HIV-l 3' LTR- Thus the 
p37gag gene was replaced by GFP, resulting in pFRED4 , 

In a second seep, the 1485bp fragment from pFRED4 , 
generated from StuI and BamHI double digestion, was subcloned 
into the 4747bp vector derived from the Nrul and BamHI double 
digestion of pcDNA3 . The resulting plasmid, pFRED7 (SEQ ID 
MO: 5), expresses GFP under the control of the early CMV 
promoter and the bovine growth hormone polyadenylation signal. 

Bacterial expresBion 

For bacterial expression, we constructed plasmid 
pBSGFP (SEQ ID N0:6), a pBSSK+ derivative carrying wtGFP. 
pBSGFP was generated by inserting the GFP containing region of 
pFRED4, digested with BamHII and BamHI and subsequently 
treated with Klenow, into the EcoRV digested pBSSK+ vector. 
In pBSGFP the wtGFP is fused downstream to the 43 amino acids 
of the alpha peptide of beta galactosidase, present in the 
pBSSK+ polylinker region. The added amino acids at the 
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N- terminus of wtGFP have no apparent effect on the GFP signal, 
as judged froni subsequent plasmids containing precise 
deletions of the extra amino acids. 

For GFP overexpression and purification we generated 
plasmid pFRED13 (SEQ ID NO: 7) by ligating the 717bp fragment 
from pFRED7 digested with Nhel and BamHI, to the 5644bp 
fragment resulting from the Nhel and BamHI double digestion of 
pETlla. In pFRED13, GFP is synthesized under the control of 
the bacteriophage T7 philO promoter. 

The oligonucleotides used for GFP mutagenesis were 
synthesized by the DNA Support Services of the ABL Basic 
Research Program of the National Cancer Institute. DNA 
sequencing was performed by the PCR-assisted fluorescent 
terminator method (ReadyReaction DyeDeoxy Terminator Cycle 
Sequencing Kit, ABI, Columbia, MD) according to the 
manufacturer's instructions. Sequencing reactions were 
resolved on the ABI Model 373A DNA Sequencing System. 
Sequencing data were analyzed using the Sequencher program 
(Gene Codes, Ann Arbor, MI) . 

Enzymes were purchased from New England Biolabs 
(Beverly, MA) and used according to conditions described by 
the supplier. Chemicals used for the purification of wild 
type and mutant proteins were purchased from SIGMA (St. Louis, 
MO) . Tissue culture media were obtained from Biof luids 
(Rockville, MD) and GIBCO/BRL (Gaithersburg , MD) . Competent 
bacterial cells were purchased from GIBCO/BRL. 

Preparation of mutants 

Initially, plasmid pBSGFP was used to mutagenize the 
GFP coding sequence by single -stranded DNA site directed 
mutagenesis, as described by Schwartz et aJ . (1992) J. Virol. 
66:7176. In addition to changing specific codons, our 
strategy was also to improve GFP expression by replacing 
potential inhibitory nucleotide sequences without altering the 
GFP amino acid sec[uence. This approach has been successfully 
employed in the past for other proteins (Schwartz et al . 
(1992) Virol, 66:7176). 
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For the pBSGFP mutagenesis the following 
oligonucleotides were used: 
#17422 (SEQ ID N0:8) : 

5 ' -CAATTTGTGTCCCAGAATGTTGCCATCTTCCTTGAAGTCAATACCTTT-3 ' 
#17423 (SEQ ID NO: 9) : 

5 ' -GTCTTGTAGTTGCCGTCATCTTTGAAGAAGATGCTCCTTTCCTGTAC-3 ' 
#17424 (SEQ ID NO: 10) : 

5 ' -CATGGAACAGGCAGTTTGCCAGTAGTGCAGATGAACTTCAGGGTAAGTTTTC-3 ' 
#17425 (SEQ ID NO: 11) : 

5 ' -CTCCACTGACAGAGAACTTGTGGCCGTTAACATCACCATC- 3 • 
#17426 (SEQ ID NO: 12) : 

5 ' -CCATCTTCAATGTTGTGGCGGGTCTTGAAGTTCACTTTGATTCCATT- 3 ' 
#17465 (SEQ ID NO: 13) : 

5 ' -CGATAAGCTTGAGGATCCTCAGTTGTACAGTTCAtCCATGC-3 • 

Oligonucleotide #17426 introduces a mutation in GPP, 
converting the Isoleucine (lie) at position 168 into Threonine 
(Thr) . The llel68Thr change has been shown to alter the GFP 
spectrum and to also increase the intensity of GFP 
fluorescence by almost two- fold at the emission maxima (Heim 
et al. (1994) , supra) . 

The mutagenesis mixture was used to transform DHSa 
competent E. coli cells. Ampicilin resistant colonies were 
obtained and examined for their fluorescent properties by 
excitation with UV light. One colony, significantly brighter 
than the rest, was apparent on the agar plate. This colony 
was further purified, the plasmid DNA was isolated and used to 
transform DHSa competent bacteria. This time all the colonies 
were bright green when excited with the UV light, indicating 
that the bright green fluorescence was associated with the 
presence of the plasmid. The sequence of the GFP segment 
(SEQ ID NO: 14, representing only the segment and not the whole 
plasmid) of this plasmid, called pBSGFPsgll, was then 
determined. The sequence analysis revealed that in addition 
to the designed nucleotide changes, which do no alter the 
amino acid sequence of GFP, and the Ilel68Thr mutation, a 
second spontaneous mutation had occurred. A thymidine at 
position 322 of SEQ ID N0:14, which is the GFP-coding region 
of the pPBSGFPsgll DNA, was replaced by a cytosine. This 
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nucleotide change converts the phenylalanine (Phe) at position 
65 of the GFP amino acid sequence into a leucine (Leu) . A 
series of experiments, which will be described below, 
demonstrated that indeed the Phe65Leu mutation was responsible 
for the increase in the intensity of the fluorescent GFP 
signal . 

In subsequent experiments, involving generation of 
rationally designed GFP mutant combinations to be detailed 
below, we also used the single-stranded DNA site directed 
mutagenesis approach. This time, however, the template DNAs 
were pFRED7 derivatives instead of pBSGFP. 

Tranafection and expression 

The 293 cell line, an adenovirus- transformed human 
embryonal kidney cell line (Graham et al. (1977), J. Gen. 
Virol. 5:59) was used for protein expression analysis. The 
cells were cultured in Dulbecco's modified culture medium 
(DMEM) supplemented with 10% heat -inactivated fetal bovine 
serum (FBS, Biofluids) . 

Transfection was performed by the calcium phosphate 
coprecipitation technique as previously described (Graham ec 
al. (1973), Virol. 52:456; Felber ec al. (1990), J. Virol. 
64:3734. Plasmid DNA was purified by Qiagen columns according 
to the manufacturer's instructions (Qiagen) . A mix of 5 to 10 
/xg of total DNA per ml of final precipitate was overlaid on 
the cells in 60 mm or 6- and 12-well tissue culture plates 
(Falcon), using 0.5, 0.25 and 0.125 ml of precipitate, 
respectively. After overnight incubation, the cells were 
washed, placed in medium without phenol red and measured in a 
plate spectrof luorometer, e.g., Cytofluor II (Perceptive 
Biosystems, Framingham, MA.) 

Purification of wild- type and mutant proteins: 

E. coli strains carrying pFRED13 or other pETlla 
derivatives with mutant GFP genes were used for the 
overproduction and purification of the wt and mutant GFPs or 
BFPS. The cells were grown in 1 liter LB broth containing 100 
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Mg/ml ampicillin at 32« C to a density of 0.6-0.8 optical 
density units at 600 nra. At this point, the cells were 
induced with 0.6 mM IPTG and incubated for four more hours. 
Following harvesting of the cell pellets, cellular extracts 
were prepared as described by Johnson, B.H and Hecht, M.H., 
1994, Biotechnol. 12: 1357. 

GFPs and BFPs were purified from the cellular 
extracts as follows: Ammonium sulfate (AS) was added first to 
the extracts {50g AS per lOOg supernatant) to precipitate the 
proteins. The precipitants were collected by centrif ugation 
at 7500 X g for IS min and the pellets were dissolved in 5ml 
of 1 M AS. The samples were then loaded on phenylsepharose 
column (HRIO/IO, Pharmacia, Piscataway, NJ) and washed with 20 
mM 2- [N-morpholino] ethanesulf onic Acid (MES) pH 5.6 and 1 M 
AS. Proteins were eluted with a 45 ml gradient to 20 mM MES, 
pH 5.6. Fractions containing the GFP or BFP protein were 
colored even under visible light. 

Green or blue-colored fractions were further 
purified on Q-sepharose (Mono Q, HR5/5, Pharmacia) with a 20 
ml gradient from 20 mM Tris pH 7.0 to 20 mM Tris pH 7.0, 0.25 
M NaCl. 

The AS precipitation step was performed at 4*^ C 
while the chromatographic procedures were performed at room 
temperature . 

Determination of protein cnncentration 

Protein concentrations were determined using the 
commercially available Bradford protein assay (BioRad, 
Hercules, CA) with bovine IgG protein as a standard. 

Analyti cal polyacrvlamide gelg 

Analytical polyacrylamide gel electrophoresis was 
used to visualize the degree of purity of the purified GFP or 
BFP proteins. In all cases, 1 mm thick, 12% acrylamide gels 
(containing 0.1% SDS, in Tris buffer, pH 7.4) were used, and 
electrophoresis was performed for 2 hours at 120 V. Gels were 
stained with Coomassie Blue to visualize the proteins. 
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Fluorescence meaaurements 

Excitation and emission spectra of solutions of the 
fluorescent proteins were obtained using a Perkin Elmer L550B 
spectrof luorimeter (Perkin Elmer, Advanced Biosystems, Foster 
City, , CA) . 

The' relative fluorescence data for the GFP mutants 
in Table I below were obtained by comparing the cellular 
fluorescence of the GFP mutants expressed in the transformed 
human embryonic kidney cell line 293 with wtGFP expressed in 
the same cell line. Likewise, the relative fluorescence data 
for the BFP mutants in Table I below were obtained by 
comparing the cellular fluorescence of the BFP mutants 
expressed in 293 cells with BFP (Tyrg7-»His) expressed in the 
same cell line. Equal amounts of DNA encoding wild type or 
mutant proteins were introduced into 293 cells. Cellular 
fluorescence was quantified 24 h or 48 hr. post-transf ection 
using Cytofluor II. 

A list of GFP mutant proteins indicating the 
introduced amino acid mutations is shown in Table I. 



TABLE I: GFP and BFP mutants 



PROTEIN 


Amino Acid Position 


65 


€€ 


67 


164 


166 


239 


wt GFP 


F 


S 


Y 


V 


I 


K 


SG12 


L 












SGll 


L 








T 


N 


SG25 


L 


C 






T 


N 


BFP 






H 








S&42 


L 




H 








SB49 






H 


A 






SB50 


L 




H 


A 







gvampl* * 1; SG12 

A number of the unique mutants described herein 
derive from the discovery of an unplanned and unexpected 
mutation called "SG12", obtained in the course of site- 
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directed mutagenesis experiments, wherein a phenylalanine at 
position 65 of wtGFP was converted to leucine. SG12 was 
prepared as follows: Two plasmids carrying SG12 (SEQ ID NO: 15) 
were generated, pFRED12 for expression in mammalian cells, and 
pFREDie for expression in E. coli and protein purification. 
pFRED12 was constructed by ligating the 1557 bp fragment from 
the double digestion of pFREDV with Avr II and Pml I into the 
4681 bp fragment generated from the Avr II and Pml I digestion 
of pFREDll (see below) . pFRED16 was derived by subcloning the 
717bp segment resulting from the digestion of pFREDl2 with 
Nhel and BamHI to the 5644bp fragment of the pETlla vector 
digested with the same restriction enzymes. 

The specific activity of SG12 was about 9-12 times 
that of WtGFP, See Table II. 



Example 2: SGll 

A mutant referred to as "SGll," which combined the 
phenylalanine 65 to leucine alteration with an isoleucine 168 
to threonine substitution and a lysine 239 to asparagine 
susbstitution, gave a further enhanced fluorescence intensity. 
SGll was prepared as follows: Two plasmids carrying SGll (SEQ 
ID N0:16) were generated: pFREDll for expression in mammalian 
cells and pFREDlS for expression in E. coli and protein 
purification. pFREDll was constructed by ligating the 717bp 
region from pBSGFPsgll DMA digested with Nhel and BamHI to the 
5221bp fragment derived from the digestion of pFRED7 with the 
same enzymes. pFREDlS was generated by subcloning the 7l7bp 
segment resulting from the digestion of pFREDll with Nhel and 
BamHI to the 5644 bp fragment of the pETila vector, digested 
with the same restriction enzymes. 

The mutant SGll encodes an engineered GFP wherein 
the alteration comprises the conversion of phenylalanine 65 to 
leucine and the conversion of isoleucine 168 to threonine. 
The additional alteration of the C-terminal lys 239 to asn is 
without effect; the C-terminal lys or asn may be deleted 
without affecting fluorescence. The specific activity of SGll 
is about 19-38 times that of wtGFP. See Table II. 
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Example 3 : SG25 

A third and further improved GFP mutant was obtained 
by further mutating "SGll." This mutant is referred to as 
"SG25" and comprises, in addtion to the SGll substitutions, 
and additional substitution of a .cysteine for the serine 
normally found at position 66 in the sequence. SGll was 
prepared as follows: Two plasmids carrying SG25 (SEQ ID NO: 17) 
were generated; pFRED25 for expression in mammalian cells and 
pFRED63 for expression in E. coii and protein purification. 
PFRED25 was constructed by site directed mutagenesis of 
pFREDll, using oligonucleotide #18217 (SEQ ID N0:18): 
5 ' - CATTGAACACCATAGCACAGAGTAGTGACTAGTGTTGGCC - 3 ' . This 
oligonucleotide incorporates the Ser66Cys mutation into SGll. 
Ser66Cys had been shown to both alter the GFP excitation 
maxima without significant change in the emission spectrum and 
to also increase the intensity of the fluorescent signal of 
GFP (Heim et ai , , 1995). 

pFRED6 3 was generated by subcloning the 717 bp 
segment resulting from the digestion of pFRED25 with Nhel and 
BamHI to the 5644 bp fragment of the pETlla vector, digested 
with the same restriction enzymes. 

The mutant SG25 encodes an engineered GFP wherein 
the alteration comprises the conversion of phenylalanine 65 to 
leu, the conversion of isoleucine 168 to threonine and the 
conversion of serine 66 to cysteine. As with SGll, the 
additional alteration of the C-terminal lysine 239 to 
asparagine is without effect; the C-terminal lysine or 
aspragine may be deleted without affecting fluorescence. The 
specific activity of SG25 is about 56 times that of wtGFP. 
See Table II. 

Example 4 ; Additional green fluorescent mutants 

Additional alterations at different amino acids of 
the WtGFP, when combined with SGll and SG25, yielded proteins 
having at least 5X greater cellular fluorescence compared to 
the WtGFP. A non- limiting list of these mutations is provided 
below : 
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GPP variants with enhanced cellular fluorescence 



Protein 


Altered Am-inr» ar.-iHe 




SG20 


F65L, 


S66T, 


I168T, 


K239N 




SG21 


F65L, 


S66A, 


I168T, 


K239N 




SG27 


Y40L, 


F65L, 


I168T, 


K239N 




SG30 


F4 7L, 


F65L, 


I168T, 


K239N 




SG32 


F72L, 


F65L, 


I168T, 


K23 9N 




SG43 


F65L, 


I168T, Y201L, K239N 




SG4 6 


F65L, 


V164A, I168T 


, K23 9N 




SG72 


F65L, 


S66C, 


V164A, 


I168T, 


K23 9N 


SG91 


F65L, 


S66C, 


FIOOL, 


I168T, 


K239N 


SG94 


F65L, 


S66C, 


Y107L, 


I168T, 


K23 9N 


SG95 


F65L, 


S66C, 


F115L, 


I168T, 




SG96 


F65L, 


S66C, 


F131L, 


I168T, 


K23 9N 


SG98 


F65L, 


S66C, 


Y146L, 


I168T, 


K239N 


SGIOO 


F65L, 


S66C, 


Y152L, 


I168T, 


K23 9N 


SGI 01 


F65L, 


S66C, 


I168T, 


Y183L, 


K239N 


SGI 02 


F65L, 


S66C, 


I168T, 


F224L, 


K239N 


SG103 


F65L, 


S66C, 


I168T, 


Y238L, 


K23 9N 


SGI 06 


F65L, 


S66T, 


V164A, 


I168T, 


K239N 



Example 5; SB49 

The blue fluorescent proteins described here and 
below were derived from the known GFP mutant (Heim ec al . . 
PNAS, 1994) wherein histidine is substituted for tyrosine at 
position 67. We have designated this known mutant 
BFP(Tyr67^His) . BFP (Tyrg^^His) has a shifted emission 
spectrum. it emits blue light, i.e.. it is a blue fluorescent 
protein (BFP) . 

By introducing the same mutation in BPP (Tyrg^-His) 
that was used to generate SG12, i.e.. leucine for 
phenylalanine at position 65. we created a new mutant that has 
unexpectedly high fluorescence that we refer no as "SuperBlue- 
42" (SB42). SB42 was prepared as follows: Two plasmids 
carrying SB42 (SEQ ID NO: 19) were generated: pFRED42 for 
expression in mammalian cells and pFRED65 for expression in E. 
coli and protein purification. pFRED42 was constructed by 
site directed mutagenesis of pFRED12. using oligonucleotide 
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l»bio2 5 ( 5 - CATTGAACACCATGAGAGAGAGTAGTGACTAGTGTTGGCC - 3 • ) ( SEQ I D 
NO: 20) . This oligonucleotide incorporates the Tyrg^-^His 
mutation into SGI2, thus generating the Phe65Leu. Tyrg^-^His 
double mutant . 

pFRED65 was created by subcloning the 717 bp segment 
resulting from the digestion of pFRED42 with Nhel and BamHI to 
the 5644 bp fragment of the pETlla vector, digested with the 
same restriction enzymes. 

The mutant SB42 encodes an engineered BFP wherein 
the alterations comprise the conversion of tyrosine 67 to 
histidine and the conversion of phenylalanine 65 to leucine. 
The specific activity of SB42 is about 27 times that of 
BFP(Tyrg7-»His) . See Table II. 

Example 6; SS49 

An independent mutation of BFP (Tyrg^-^His) which 
substitutes the valine at position 164 with an alanine is 
referred to as "SB49." SB49 was prepared as follows: Plasmid 
pFRED4 9 expresses SB4 9 (SEQ ID NO: 21) in mammalian cells. 
pFRED4 9 was generated by site directed mutagenesis of pFRED12, 
using oligonucleotides #19059 and #bio24 . Oligonucleotide 
#19059 ( 5 • - CTTCAATGTTGTGGCGGATCTTGAAGTTCGCTTTGATTCCATTC - 3 ' ) 
(SEQ ID NO: 22) introduces the Vall64Ala mutation in SG12 while 
oligonucleotide #bio24 (5'- 

CATTGAACACCATGAGAGAAAGTAGTGACTAGTGTTGGCC - 3 ' ) ( SEQ ID NO : 2 3 ) 
reverts the Phe65Leu alteration to the wt sequence and, at the 
same time, incorporates the Tyrg^-^His mutation. 

The mutant SB49 encodes an engineered BFP wherein 
the alterations comprise the conversion of tyrosine 67 to 
histidine, and the conversion of valine 164 to alanine. The 
specific activity of SB49 was about 37 times that of 
BFP(Tyrg-;-.»His) . See Table II. 

Example 7t SB50 

A combination of the above two BFP mutations 
resulted in "SB50," which gave an even greater fluorescence 
enhancement than either of the previous mutations. SB50 was 
prepared as follows: Two plasmids carrying SB50 (SEQ ID NO: 
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24) were generated: pFREDSO for expression in mammalian cells 
and PFRED67 for expression in E. coli and protein 
purification. pFREDSO was constructed by site directed 
mutagenesis of pFRED12, using oligonucleotides #19059 and 
#bio25. 

PFRED67 was created by subcloning the 717bp segment 
resulting from the digestion of pFREDSO with Nhel and BamHI to 
the 5644 bp fragment of the pETlla vector digested with the 
same restriction enzymes. 

The mutant SB50 encodes an engineered BFP wherein 
the alterations comprise the conversion of tyrosine 67 to 
histidine, the conversion of phenylalanine 65 to leucine and 
the conversion of alanine 164 to valine. The specific 
activity of SB50 was about 63 times that of BFP (Tyr^^-^His) . 
See Table II. 



TABLE IX 



Mutant 


Excitation 
Maxim\un 
(nm) 


Emission 

Maximum 

(zun) 


Factor of 
increased 
green 

fluorescence 
(at maximum 
emission) as 
compared to 
WtGFP 


Factor of 
increased blue 
fluorescence 
(at maximxxm 
emission) as 
compared to 
BFP(Tyrg7^His) 


SGI 2 


398 


509 


9-12X 




SGll 


471 


508 


19-38X 




SG25 


473 


509 


50-lOOX 




SB42 


387 


450 




27X 


SB4 9 


387 


450 




3 7X 


SB50 


387 


450 




63X 



The dramatic increase in fluorescent activity 
resulting from the amino acid substitutions of the present 
invention was wholly unexpected. The cellular fluorescence of 
the mutants was at least five times greater, and usually over 
twenty times greater, than that of the parent wtGFP or 
BFP{Tyrg7-*His) . Note that the maximum emission wavelengths 
vary among the mutants, and that the above -reported fold 
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increases refer only to minimal increases in relative cellular 
fluorescence at the maximum emission wavelength of the mutant . 
Given a particular wavelength, the values may be substantially 
larger, i.e., the mutants may have a 200-fold greater cellular 
5 fluorescence than the reference wtGTP or BFP (Tyr^^-^His) . This 
is important because devices for measuring fluorescence often 
have set wavelengths, or the limitations of a given experiment 
often require the use of a set wavelength. Thus, for example, 
the emission and detection parameters of a fluorescence 
10 microscope or a fluorescence-activated cell sorter may be set 
for a wavelength wherein the cellular fluorescence of a given 
mutant is 200 -fold greater than that of the known GFPs and 
BFPs. 

The GFP and BFP mutants of this invention, in 
15 contrast to the wild type protein or other reported mutants, 
allow detection of green fluorescence in living mammalian 
cells when present in few copies stably integrated into the 
genome. This high cellular fluorescence of the mutant GFPs 
and BFPs is useful for rapid and simple detection of gene 
20 expression in living cells and tissues and for repeated 
analysis of gene expression over time under a variety of 
conditions. They are also useful for the construction of 
stable marked cell lines that can be quickly identified by 
fluorescence microscopy or fluorescence activated cell 
25 sorting. 

Example 8 

We have established f luoroplate-based assays for the 
quantitation of gene expression after transf ections . In a 

30 number of embodiments, a nucleic acid encoding a mutant GFP or 
BFP of this invention is inserted into a vector and introduced 
into and expressed in a cell. Typically, expression of GFP 
mutants can be detected as quickly as 5 hours post-infection 
or less. Expression is followed over time in living cells by 

35 a simple measurement in mult i -well plates. In this way, many 
transf ections can be processed in parallel. 
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Eifatnpl o Q 

The vectors and nucleic acids provided herein are 
used to generate chimeric proteins wherein a nucleic acid 
sequence that encodes a selected gene product is fused to the 
C- or N- terminus of the mutant GFPs and/or BFPs of this 
invention. A number of unique viral, plasmid and hybrid gene 
constructs have been generated that incorporate the new mutant 
GFP and/or mutant BFP sequences indicated above. These 
include: 

• HIV viral sequences (in the nef gene) containing SGll or 
SG25 

• Neomycin & hygromycin plasmids containing SGll or SG25 

• Moloney Leukemia Virus vector (retrovirus) also 
expressing SG25 

• Hybrid gene constructs expressing HIV viral proteins 
(rev, td-rev, tat, nef, gag, env, and vpr) and either 
SGll or SG25 or SB50. 

• Hybrid gene construct containing vectors that incorporate 
the cytoplasmic proteins ran, B23, nucleolin, poly-A 
binding protein and either SGll or SG25 or SB50. 

These hybrids of the mutant nucleic acids provided 
herein are used to study protein trafficking in living 
mammalian cells. Like the wild type GFP, the mutant GFP 
proteins are normally distributed throughout the cell except 
for the nucleolus. Fusions to other proteins redistribute the 
fluorescence, depending on the partner in the hybrid. For 
example, fusion with the entire HIV-l Rev protein results in a 
hybrid molecule which retains the Rev function and is 
localized in the nucleolus where Rev is preferentially found. 
Fusion to the N- terminal domain of the HIV-i Nef protein 
created a chimeric protein detected in the plasma membrane, 
the site of Nef localization. 

Example 10; cCMVafoll 

pCMVgfoll is a pFREDll derivative containing the 
bacterial neomycin phosphotransferase gene (neo) (Southern and 
Berg (1982) J. Mol. Appl . Genetics 1:327) fused at the 
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C-terminus of SGll. A four amino acid (Gly-Ala-Gly-Ala) (SEQ 
ID NO: 26) linker region connects the last amino acid of SGll 
to the second amino acid of neo, thus generating the hybrid 
SGll-neo protein (gfoll, SEQ ID NO:25) . Gfoll is expressed 
5 from the CMV promoter and contains the intact SGll polypeptide 
and all of neo except for the first Met. 

pCMVgfoll was constructed in several steps. First, 
pFREDllDNae was constructed by Nael digestion of pFREDll and 
self -ligation of the 4613bp fragment.. The Nael deletion 
10 removes the SV40 promoter and neo gene from pFREDll,thus 

creating pFREDllDNae. Next, in order to fuse the neo coding 
region downstream to SGll, the neo gene was PCR amplified from 
pcDNAB using primers BioSl 

(5' -CGCGGATCCTTCGAACAAGATGGATTGCACGC-3' ) (SEQ IDNO:27) and 
15 Bio52 ( 5 - CCGGAATTCTCAGAAGAACTCGTCAAGAAGGCGA- 3 ' ) ( SEQ ID 

NO: 28) . Primer BioSl introduces a BamHI site followed by a 
BstBI recognition sequence at the 5' end of neo, while primer 
Bio52 introduces an EcoRI site 3' to the neo gene. The PCR 
product was digested with BamHI and EcoRI and cloned into the 
20 4582 bp vector resulting from the BamHI -EcoRI digestion of 

pFREDllDNae, thus generating pFREDllDNaeBstNeo. Subsequently, 
SGll was PCR amplified from pFREDllDNae using primers Bio4 9 
( 5 • - GGCGCGCAAGAAATGGCTAGCAAAGGAGAAGAACTCTTCACTGGAG - 3 ' ) ( S EQ I D 
NO: 29) and BioSO 

25 ( 5 • - CCCATCGATAGCACCAGCACCGTTGTACAGTTCATCCATGCCATGT - 3 ' ) ( SEQ I D 
NO: 30) to remove the sgll stop codon in pFREDllDNaeBstNeo and 
to introduce the four amino acid (Gly-Ala-Gly-Ala) linker 
followed by a Clal site. The PCR product was digested with 
Nhel and Clal and cloned into the 4763 bp NhelBstBi fragment 

30 from pFREDllDNaeBstNeo, thus generating pCMVgfoll. 

Following transfection of 293 cells (Graham et ai . 
(1977), J. Gen, Virol. 5:59) as well as other human and mouse 
cell lines with pCMVgfoll, bright fluorescent transf ectants 
were apparent under the flourescent microscope and colonies 

35 resistant to G418 could be obtained two weeks later. 

It should be noted that pCMVgfoll was the best 
protein fusion in terms of fluorescent emission intensity and 



wo 97/42320 



PCTAJS97/07625 



57 

number of G418 resistant colonies compared to several SGll-neo 
or neo-SGll fusions generated and examined. 

PvampT e 11; ppGKafo25 

pPGKgfo25 is a pCMVgfoII derivative containing SG25 
instead of SGll within gfo (SEQ ID NO: 31) . Expression of 
gfo25 in pPGKgfo25 is under the control of the mouse 
phosphoglycerate kinase-1 (PGK) promoter. 

pPGKgfo25 was constructed in several steps. First, 
a Sad I site was introduced downstream of the PGK promoter in 
pPGKneobpA (Soriano et al. (1991) Cell: 64-393) by: 

i) annealing oligonucleotides #18990 (SEQ ID NO:32) 
(5' -GACCGGGACACGTATCCAGCCTCCGC-3 ' ) and 18991 (SEQ ID 
NO : 3 3 ) ( 5 • - GGAGG CTGGATACGTGTCCCGGTCTGCA- 3 ' ) to create a 
double stranded adapter for PstI at the 5* end and Sad I 
at the 3' end. 

ii) ligating this adapter to the 3423bp fragment from the 
Pstl-SacII double digestion of pPGKneobpA, thus 
generating pPGKPtAfSc. 

Next, the CMV promoter of pFRED25 was replaced with the PGK 
promoter by cloning the 565bp Sail (filled with Klenow) -SacII 
region from pPGKPtAfSc to the 52B8bp Bglll (filled with 
Klenow) -SacII fragment from pFRED25, resulting in pFRED25PGK. 
In the final step, pPGKgfo25 was constructed by ligating the 
813bp Bglll-Ndel fragment from pFRED25PGK containing the PGK 
promoter and SG25, to the 4185bp Bglll-Ndel fragment of 
pCMVgfoll . 

Example 12: oGe n - PGKafo2 5 RO (SEP ID NO; 34) 

pGen-PGKgfo25RO is a pGen- (Soriano et al. (1991), J". 
Virol. 65:2314) derivative containing the gfo25 hybrid under 
the control of PGK promoter. It was constructed by subcloning 
the 2810bp Sail fragment of pPGKgfo25 into the Xhol site of 
pGen. In viruses generated from pGen-PGKgf o25RO (see below) 
transcription originated from the PGK promoter is in reverse 
orientation (RO) to that initiated from the viral long 
terminal repeats (LTR) . 
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To generate ecotropic or pseudotyped viruses, 
pGen-PGKgf O25R0 was co-transf ected into 293 cells together 
with pHITSO and pHIT123 DNAs (production of ecotropic virus) 
or with pHIT60 and pHCMV-G DNAs (production of pseudotyped 
5 virus) . pHITSO and pHIT123 contain the gag-pol and env coding 
regions from the Moloney murine leukemia virus (Mo-MLV) 
respectively, under the control of the CMV promoter (Soneoka 
et al . (1995), Nuc. Acid Res. 23:628. pHCMV-G contains the 
coding region of the G protein from the vesicular stomatitis 
10 virus (VSV) expressed from the OW promoter (Yee et al . 

(1994), Proc. Nat*l Acad. Sci. USA 91:9564. Virus -containing 
supernatants were harvested 48 hours post transf ection, 
filtered and stored at -80*C. 



15 Example 13s oNLnSGll (SEP ID NO!3S) 

The SGll sequence from plasmid pFREDll was PCR- 
amplified with primers #17982 (SEQ ID NO: 36) 

(5 • -GGGGCGTACGGAGCGCTCCGAATTCGGTACCGTTTAAACGGGCCCTCTCGAGTCC 
GTTGTACAGTTCATCCATG-3 * ) and #17983 (SEQ ID NO: 37) 

20 { 5 » - GGGGGAATTCGCGCGCGTACGTAAGCGCTAGCTGAGCAAGAAATGGCTAGCAAA 

GGAGAAGAACTC-3 ' ) . The PGR product was digested with BlpI and 
Xhol and cloned into the large Blpl-Xhol fragment from pNL4-3 
(Adachi et al. (1986), J. Virol. 59: 284. In pNLnSGll the 
full SGll polypeptide containing an additional four 

25 linker-encoded amino acids at the C- terminus, is expressed as 
a hybrid protein with the 24 N- terminal amino acids of the 
HIV-1 protein Nef . 

We constructed transmissible HIV-1 stocks with our 
mutants, which generate green fluorescence upon transfection 

30 of human cells. These transmissible HIV-1 stocks are used to 
detect the kinetics of infection under a variety of 
conditions. In particular, they are used to study the effects 
of drugs on the kinetics of infection. The level of 
fluorescence, and the subcellular compartmentalization of that 

35 fluorescence, is easily visualized and quantified using well 
known methods. This system is easy to visualize, and 
dramatically cuts the costs of many experiments that are 
presently tedious and expensive. 
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To produce infectious virus, pNLnSGll was 
transfected in 293 cells. 24 hours later, Jurkat cells were 
added to the transfectants . At various times post -infection, 
the medium was removed, filtered, and used to infect fresh 
Jurkat or other HIV-l-permissive cells. Two days later the 
infected cells were green under fluorescent microscope. 
Visible syncytia were also green. Viral stocks were generated 
and kept at -80« C. 

When the nucleic acids, vectors, mutant proteins 
provided herein are combined with the knowledge of those 
skilled in the art of genetic engineering and the guidance 
provided herein, it will be apparent to one of ordinary skill 
in the art that many changes and modifications can be made 
thereto without departing from the spirit or scope of the 
invention as set forth herein. These changes and 
modifications are encompassed by the present invention. 



wo 97/42320 



PCT/US97/07625 



10 



15 



55 



€0 

SEQUENCE LISTING 



(1) GENERAL. INFORMATION: 



(i) APPLICANT; Pavlakis, George N. 

Gaitanaris. George A. 
stauber. Roland H. 
Vournakis, John N. 

(ii> TITLE OF INVENTION: Mutant Aequorea victoria Fluorescent 
Proteins Having Increased Cellular Fluorescence 

(iii) NUMBER OF SEQUENCES: 37 



(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Townsend and Townsend and Crew LLP 

(B) STREET: Two Embarcadero Center. 8th Floor 

(C) CITY: San Francisco 
20 (D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP: 94111-3834 

(v) COMPUTER READABLE FORM: 
25 (A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS /MS -DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.30 

■30 <vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US Not yet assigned 

(B) FILING DATE: Not yet assigned 

(C) CLASSIFICATION: 

35 (Viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Weber, Kenneth A, 

(B) REGISTRATION NUMBER: 31,677 

(C) REFERENCE/DOCKET NUMBER: 015280-249000 

40 (ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (415) 576-0200 

(B) TELEFAX: (415) 576-0300 

4 5 (2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 

(B) TYPE: nucleic acid 
50 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 
<B) LOCATION: 1..720 

(D) OTHER INFORMATION: /products "wild type Aequorea victoria 
60 Green Fluorescent Protein (wtGF) " 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

ATG GOT AGO AAA GGA GAA GAA CTC TTC ACT GGA GTT GTC CCA ATT CTT 
Met Ala Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro He Leu 
15 10 15 

GTT GAA TTA GAT GGT GAT GTT AAT GGG CAC AAA TTT TCT GTC ACT GGA 
Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Glv 
20 25 30 

GAG GGT GAA GGT GAT GCA ACA TAG GGA AAA CTT ACC CTT AAA TTT ATT 14 
Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe He 
35 40 45 

TGC ACT ACT GGA AAA CTA CCT GTT CCA TGG CCA ACA CTT GTC ACT ACT 19 
Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr 
50 55 60 

TTC TCT TAT GGT GTT CAA TGC TTT TCA AGA TAC CCG GAT CAT ATG AAA 24 
Phe Ser Tyr Gly Val Gin Cys Phe Ser Arg Tyr Pro Asp His Met Lys 
65 70 75 80 

CGG CAT GAC TTT TTC AAG AGT GCC ATG CCC GAA GGT TAT GTA CAG GAA 28 
Arg His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu 
85 90 95 

AGA ACT ATA TTT TTC AAA GAT GAC GGG AAC TAC AAG ACA CGT GCT GAA 336 
Arg Thr He Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu 
100 105 1X0 

GTC AAG TTT GAA GGT GAT ACC CTT GTT AAT AGA ATC GAG TTA AAA GGT 384 
Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg He Glu Leu Lys Gly 
115 120 125 

ATT GAT TTT AAA GAA GAT GGA AAC ATT CTT GGA CAC AAA TTG GAA TAC 432 
He Asp Phe Lys Glu Asp Gly Asn He Leu Gly His Lys Leu Glu Tyr 
130 135 140 

AAC TAT AAC TCA CAC AAT GTA TAC ATC ATG GCA GAC AAA CAA AAG AAT 480 
Asn Tyr Asn Ser His Asn Val Tyr He Met Ala Asp Lys Gin Lys Asn 
145 150 1S5 160 

GGA ATC AAA GTT AAC TTC AAA ATT AGA CAC AAC ATT GAA GAT GGA AGC 528 
Gly He Lys Val Asn Phe Lys He Arg His Asn He Glu Asp Gly Ser 
165 170 175 

GTT CAA CTA GCA GAC CAT TAT CAA CAA AAT ACT CCA ATT GGC GAT GGC 576 
Val Gin Leu Ala Asp His Tyr Gin Gin Asn Thr Pro He Gly Asp Gly 
180 185 190 

CCT GTC CTT TTA CCA GAC AAC CAT TAC CTG TCC ACA CAA TCT GCC CTT 624 
Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gin Ser Ala Leu 
195 200 205 

TCG AAA GAT CCC AAC GAA AAG AGA GAC CAC ATG GTC CTT CTT GAG TTT 672 
Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe 
210 215 220 

GTA ACA GCT GCT GGG ATT ACA CAT GGC ATG GAT GAA CTA TAC AAA TAA 720 
Val Thr Ala Ala Gly He Thr His Gly Met Asp Glu Leu Tyr Lys * 
225 230 235 240 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 9 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Ala Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro He Leu 
15 10 15 

Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly 
20 25 30 

Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe He 
35 40 45 

Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr 
50 55 60 

Phe Ser Tyr Gly Val Gin Cys Phe Ser Arg Tyr Pro Asp His Met Lys 
65 70 75 80 

Arg His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu 
85 90 95 

Arg Thr He Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu 
100 105 110 

Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg He Glu Leu Lys Gly 
115 120 125 

He Asp Phe Lys Glu Asp Gly Asn He Leu Gly His Lys Leu Glu Tyr 
130 135 140 

Asn Tyr Asn Ser His Asn Val Tyr He Met Ala Asp Lys Gin Lys Asn 
145 ISO 155 160 

Gly He Lys Val Asn Phe Lys He Arg His Asn He Glu Asp Gly Ser 
165 170 175 

Val Gin Leu Ala Asp His Tyr Gin Gin Asn Thr Pro He Gly Asp Gly 
180 185 190 

Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gin Ser Ala Leu 
195 200 205 

Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe 
210 215 220 

Val Thr Ala Ala Gly He Thr His Gly Met Asp Glu Leu Tyr Lys 
225 230 235 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 
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(A) NAME/KEY; - 

(B) LOCATION: 1..35 

(D) OTHER INFORMATION: /note» "oligonucleotide sense primer 

#16417" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
GGAGGCGCGC AAGAAATGGC TAGCAAAGGA GAAGA 35 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1 . .36 

(D) OTHER INFORMATION: /note= "oiigonucleotide antisense primer 

#16418" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
GCGGGATCCT TATTTGTATA GTTCATCCAT GCCATG 2€ 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6238 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..6238 

(D) OTHER INFORMATION: /note= "pFRED7" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GACGGATCGG GAGATCTCCC GATCCCCTAT GGTCGACTCT CAGTACAATC TGCTCTGATG 60 

CCGCATAGTT AAGCCAGTAT CTGCTCCCTG CTTTGTGTGTT GGAGGTCG(rr GAGTAGTGCG 120 

CGAGCAAAAT TTAAGCTACA ACAAGGCAAG GCTTGACCGA CAATTGCATG AAGAATCTGC 180 

TTAGGGTTAG GCGTTTTGCG CTGCTTCGCC TCGAGGCCTG GCCATTGCAT ACGTTGTATC 240 

CATATCATAA TATGTACATT TATATTGGCT CATGTCCAAC ATTACCGCCA TGTTGACATT 300 

GATTATTGAC TAGTTATTAA TAGTAATCAA TTACGGGGTC ATTAGTTCAT AGCCCATATA 360 

TGGAGTTCCG CGTTACATAA CTTACGGTAA ATGGCCCGCC TGGCTGACCG CCCAACGACC 420 

CCCGCCCATT GACGTCAATA ATGACX3TATG TTCCCATAGT AACGCCAATA GGGACTTTCC 4 80 
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ATTGACGTCA 


ATGGGTGGAG 


TATTTACGGT 


ATCATATGCC 


AAGTACGCCC 


CCTATTGACG 


ATGCCCAGTA 


CATGACCTTA 


TGGGACTTTC 


TCGCTATTAC 


CATGGTGATG 


CGGTTTTGGC 


ACTCACGGGG 


ATTTCCAAGT 


CTCCACCCCA 


AAAATCAACG 


GGACTTTCCA 


AAATGTCGTA 


GTAGGCGTGT 


ACGGTGGGAG 


GTCTATATAA 



15 CCTGGAGACG CCATCCACGC TGTTTTGACC 

TCCGCGGGCG CGCAAGAAAT GGCTAGCAAA 
ATTCTTGTTG AATTAGATGG TGATGTTAAT 

20 

GAAGGTGATG ^CAACATACGG AAAACTTACC 
CCTGTTCCAT GGCCAACACT TGTCACTACT 
25 TACCCGGATC ATATGAAACG GCATGACTTT 

CAGGAAAGAA CTATATTTTT CAAAGATGAC 
TTTGAAGGTG ATACCCTTGT TAATAGAATC 

30 

GGAAACATTC TTGGACACAA ATTGGAATAC 
GCAGACAAAC AAAAGAATGG AATCAAAGTT 
35 GGAAGCGTTC AACTAGCAGA CCATTATCAA 

CTTTTACCAG ACAACCATTA CCTGTCCACA 
AAGAGAGACC ACATGGTCCT TCTTGAGTTT 

40 

GATGAACTAT ACAAATAAGG ATCCACTAGT 
ATATCCATCA CACTGGCGGC CGCTCGAGCA 
45 ACCTAAATGC TAGAGCTCGC TGATCAGCCT 

TTGTTTGCCC CTCCCCCGTG CCTTCCTTGA 
CCTAATAAAA TGAGGAAATT GCATCGCATT 

50 

GTGGGGTGGG GCAGGACAGC AAGGGGGAGG 
ATGCGGTGGG CTCTATGGCT TCTGAGGCGG 
55 CCCACGCGCC CTGTAGCGGC GCATTAAGCG 

CCGCTACACT TGCCAGCGCC CTAGCGCCCG 
CCACGTTCGC CGGCTTTCCC CGTCAAGCTC 

60 

TTAGTGCTTT ACGGCACCTC GACCCCAAAA 
GGCCATCGCC CTGATAGACG GTTTTTCGCC 
65 GTGGACTCTT GTTCCAAACT GGAACAACAC 

TATAAGGGAT TTTGGGGATT TCGGCCTATT 
TTAACGCGAA TTAATTCTGT GGAATGTGTG 



64 



AAACTGCCCA 


CTTGGCAGTA 


catcaagtgt 


540 


TCAATGACGG 


TAAATGGCCC 


GCCTGGCATT 


600 


CTACTTGGCA 


GTACATCTAC 


gtattagtca 


660 


AGTACATCAA 


TGGGCGTGGA 


TAGCGGTTTG 


720 


TTGACGTCAA 


TGGGAGTTTG 


TTTTGGCACC 


780 


ACAACTCCGC 


CCCATTGACG 


CAAATGGGCG 


840 


GCAGAGCTCG 


TTTAGTGAAC 


CGTCAGATCG 


900 


TCCATAGAAG 


ACACCGGGAC 


CGATCCAGCC 


960 


GGAGAAGAAC 


TCTTCACTGG 


AGTTGTCCCA 


1020 


GGGCACAAAT 


TTTCTGTCAG 


TGGAGAGGGT 


1080 


CTTAAATTTA 


TTTGCACTAC 


TGGAAAACTA 


1140 


TTCTCTTATG 


GTGTTCAATG 


CTTTTCAAGA 


1200 


TTCAAGAGTG 


CCATGCCCGA 


AGGTTATGTA 


1260 


GGGAACTACA 


AGACACGTGC 


TGAAGTCAAG 


1320 


GAGTTAAAAG 


GTATTGATTT 


TAAAGAAGAT 


1380 


AACTATAACT 


CACACAATGT 


ATACATCATG 


1440 


AACTTCAAAA 


TTAGACACAA 


CATTGAAGAT 


1500 


CAAAATACTC 


CAATTGGCGA 


TGGCCCTGTC 


1560 


CAATCTGCCC 


TTTCGAAAGA 


TCCCAACGAA 


1620 


GTAACAGCTG 


CTGGGATTAC 


ACATGGCATG 


1680 


AACGGCCGCC 


AGTGTGCTGG 


AATTCTGCAG 


1740 


TGCATCTAGA 


GGGCCCTATT 


CTATAGTGTC 


1800 


CGACTGTGCC 


TTCTAGTTGC 


CAGCCATCTG 


1860 


CCCTGGAAGG 


TGCCACTCCC 


ACTGTCCTTT 


1920 


GTCTGAGTAG 


GTGTCATTCT 


ATTCTGGGGG 


1980 


ATTGGGAAGA 


caatagcagg' 


CATGCTGGGG 


2040 


AAAGAACCAG 


CTGGGGCTCT 


AGGGGGTATC 


2100 


CGGCGGGTGT 


ggtggttacg 


CGCAGCGTGA 


2160 


CTCCTTTCGC 


TTTCTTCCCT 


TCCTTTCTCG 


2220 


TAAATCGGGG 


CATCCCTTTA 


GGGTTCCGAT 


2280 


AACTTGATTA 


GGGTGATGGT 


TCACGTAGTG 


2340 


CTTTGACGTT 


ggagtccacg 


TTCTTTAATA 


2400 


TCAACCCTAT 


CTCGGTCTAT 


TCTTTTGATT 


2460 


GGTTAAAAAA 


TGAGCTGATT 


TAACAAAAAT 


2520 


TCAGTTAGGG 


TGTGGAAAGT 


CCCCAGGCTC 


2580 
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CCCAGGCAGG CAGAAGTATG CAAAGCATGC 
AGTCCCCAGG CTCCCCAGCA GGCAGAAGTA 
5 CCATAGTCCC GCCCCTAACT CCGCCCATCC 

CTCCGCCCCA TGGCTGACTA ATTTTTTTTA 
CTGAGCTATT CCAGAAGTAG TGAGGAGGCT 

10 

TCCCGGGAGC TTGTATATCC ATTTTCGGAT 
GCATGATTGA ACAAGATGGA TTGCACGCAG 
15 TCGGCTATGA CTGGGCACAA CAGACAATCG 

CAGCGCAGGG GCGCCCGGTT CTTTTTGTCA 
TGCAGGACGA GGCAGCGCGG CTATCGTGGC 

20 

TGCTCGACGT TGTCACTGAA GCGGGAAGGG 
AGGATCTCCT GTCATCTCAC CTTGCTCCTG 
25 TGCGGCGGCT GCATACGCTT GATCCGGCTA 

GCATCGAGCG AGCACGTACT CGGATGGAAG 
AAGAGCATCA GGGGCTCGCG CCAGCCGAAC 

30 

ACGGCGAGGA TCTCGTCGTG ACCCATGGCG 
ATGGCCGCTT TTCTGGATTC ATCGACTGTG 
35 ACATAGCGTT GGCTACCCGT GATATTGCTG 

TCCTCGTGCT TTACGGTATC GCCGCTCCCG 
TTGACGAGTT CTTCTGAGCG GGACTCTGGG 

40 

CCTGCCATCA CGAGATTTCG ATTCCACCGC 
CGTTTTCCGG GACGCCGGCT GGATGATCCT 
45 CGCCCACCCC AACTTGTTTA TTGCAGCTTA 

AAATTTCACA AATAAAGCAT TTTTTTCACT 
CAATGTATCT TATCATGTCT GTATACCGTC 

50 

GTCATAGCTG TTTCCTGTGT GAAATTXSTTA 
CGGAAGCATA AAGTGTAAAG CCTGGGGTGC 
55 GTTGCGCTCA CTGCCCGCTT TCCAGTCGGG 
CGGCCAACGC GCGGGGAGAG GCGGTTTGCG 
TGACTCGCTG CGCTCGGTCG TTCGGCTGCG 

60 

AATACGGTTA TCCACAGAAT CAGGGGATAA 
GCAAAAGGCC AGGAACCGTA AAAAGGCCGC 
65 CCCTGACGAG CATCACAAAA ATCGACGCTC 

ATAAAGATAC CAGGCGTTTC CCCCTGGAAG 
GCCGCTTACC GGATACCTGT CCGCCTTTCT 



65 

ATCTCAATTA GTCAGCAACC AGGTGTGGAA 2640 

TGCAAAGCAT GCATCTCAAT TAGTCAGCAA 2700 

CGCCCCTAAC TCCGCCCAGT TCCGCCCATT 2760 

TTTATGCAGA GGCCGAGGCC GCCTCTGCCT 2820 

TTTTTGGAGG CCTAGGCTTT TGCAAAAAGC 2880 

CTGATCAAGA GACAGGATGA GGATCGTTTC 2 940 

GTTCTCCGGC CGCTTGGGTG GAGAGGCTAT 3000 

GCTGCTCTGA TGCCGCCGTG TTCCGGCTGT 3 060 

AGACCGACCT GTCCGGTGCC CTGAATGAAC 3120 

TGGCCACGAC GGGCGTTCCT TGCGCAGCTG 3180 

ACTGGCTGCT ATTGGGCGAA GTGCCGGGGC 32 4 0 

CCGAGAAAGT ATCCATCATG GCTGATGCAA 3300 

CCTGCCCATT CGACCACCAA GCGAAACATC 3360 

CCGGTCTTGT CGATCAGGAT GATCTGGACG 3420 

TGTTCGCCAG GCTCAAGGCG CGCATGCCCG 3480 

ATGCCTGCTT GCCGAATATC ATGGTGGAAA 3540 

GCCGGCTGGG TGTGGCGGAC CGCTATCAGG 3600 

AAGAGCTTGG CGGCGAATGG GCTGACCGCT 3660 

ATTCGCAGCG CATCGCCTTC TATCGCCTTC 3720 

GTTCGAAATG ACCGACCAAG CGACGCCCAA 3780 

CGCCTTCTAT GAAAGGTTGG GCTTCGGAAT 3840 

CCAGCGCGGG GATCTCATGC TGGAGTTCTT 3900 

TAATGGTTAC AAATAAAGCA ATAGCATCAC 3 960 

GCATTCTAGT TGTGGTTTGT CCAAACTCAT 4020 

GACCTCTAGC TAGAGCTTGG CGTAATCATG 4080 

TCCGCTCACA ATTCCACACA ACATACGAGC 4140 

CTAATGAGTG AGCTAACTCA CATTAATTGC 4200 

AAACCTGTCG TGCCAGCTCC ATTAATGAAT 4260 

TATTGGGCGC TCTTCCGCTT CCTCGCTCAC 4 320 

GCGAGCGGTA TCAGCTCACT CAAAGGCGGT 4380 

CGCAGGAAAG AACATGTGAG CAAAAGGCCA 4440 

GTTGCTGGCG TTTTTCCATA GGCTCCGCCC 4 500 

AAGTCAGAGG TGGCGAAACC CGACAGGACT 4560 

CTCCCTCGTG CGCTCTCCTG TTCCGACCCT 4620 

CCCTTCGGGA AGCGTGGCGC TTTCTCAATG 4680 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



CTCACGCTGT 
CGAACCCCCC 
CCCGGTAAGA 
GAGGTATGTA 
AAGGACAGTA 
TAGCTCITGA 
GCAGATTACG 
TGACGCTCAG 
GATCTTCACC 
TGAGTAAACT 
CTGTCTATTT 
GGAGGGCTTA 
TCCAGATTTA 
AACTTTATCC 
GCCAGTTAAT 
GTCGTTTGGT 

cccx:atgttg 
gttggccgca 
gccatccgta 
gtgtatgcgg 
tagcagaact 
gatcttaccg 

AGCATCTTTT 
AAAAAAGGGA 
TTATTGAAGC 
GAAAAATAAA 



AGGTATCTCA 
GTTCAGCCCG 
CACGACTTAT 
GGCGGTGCTA 
TTTGGTATCT 
TCCGGCAAAC 
CGCAGAAAAA 
TGGAACGAAA 
TAGATCCTTT 
TGGTCTGACA 
CGTTCATCCA 
CCATCTGGCC 
TCAGCAATAA 
GCCTCCATCC 
AGTTTGCGCA 
ATGGCTTCAT 
TGCAAAAAAG 
GTGTTATCAC 
AGATGCTTTT 
CGACCGAGTT 
TTAAAAGTGC 
CTGTTGAGAT 
ACTTTCACCA 
ATAAGGGCGA 
ATTTATCAGG 
CAAATAGGGG 



GTTCGGTGTA 
ACCGCTGCGC 
CGCCACTGGC 
CAGAGTTCTT 
GCGCTCTGCT 
AAACCACCGC 
AAGGATCTCA 
ACTCACGTTA 
TAAATTAAAA 
GTTACCAATG 
TA6TTGCCTG 
CCAGTGCTGC 
ACCAGCCAGC 
AGTCTATTAA 
ACGTTGTTGC 
TCAGCTCCGG 
CGGTTAGCTC 
TCATGGTTAT 
CTGTGACTGG 
GCTCTTGCCC 
TCATCATTGG 
CCAGTTCGAT 
GCGTTTCTGG 
CACGGAAATG 
GTTATTGTCT 
TTCCGCGCAC 



66 

GGTCGTTCGC 
CTTATCCGGT 
AGCAGCCACT 
GAAGTGGTGG 
GAAGCCAGTT 
TGGTAGCGGT 
AGAAGATCCT 
AGGGATTTTG 
ATGAAGTTTT 
CTTAATCAGT 
ACTCCCCGTC 
AATGATACCG 
CGGAAGGGCC 
TTGTTGCCGG 
CATTGCTACA 
TTCCCAACGA 
CTTCGGTCCT 
GGCAGCACTG 
TGAGTACTCA 
GGCGTCAATA 
AAAACGTTCT 
GTAACCCACT 
GTGAGCAAAA 
TTGAATACTC 
CATGAGCGGA 
ATTTCCCCGA 



TCCAAGCTGG 
AACTATCGTC 
GGTAACAGGA 
CCTAACTACG 
ACCTTCGGAA 
GGTTrTTTTG 
TTGATCTTTT 
GTCATGAGAT 
AAATCAATCT 
GAGGCACCTA 
GTGTAGATAA 
CGAGACCCAC 
GAGCGCAGAA 
GAAGCTAGAG 
GGCATCGTGG 
TCAAGGCGAG 
CCGATCGTTG 
CATAATTCTC 
ACCAAGTCAT 
CGGGATAATA 
TCGGGGCGAA 
CGTGCACCCA 
ACAGGAAGGC 
ATACTCTTCC 
TACATATTTG 
AAAGTGCCAC 



GCTGTGTGCA 

TTGAGTCCAA 

TTAGCAGAGC 

GCTACACTAG 

AAAGAGTTGG 

TTTGCAAGCA 

CTACGGGGTC 

TATCAAAAAG 

AAAGTATATA 

TCTCAGCGAT 

CTACGATACG 

GCTCACCGGC 

GTGGTCCTGC 

TAAGTAGTTC 

TGTCACGCTC 

TTACATGATC 

TCAGAAGTAA 

TTACTGTCAT 

TCTGAGAATA 

CCGCGCCACA 

AACTCTCAAG 

ACTGATCTTC 

AAAATGCCGC 

TTTTTCAATA 

AATGTATTTA 

CTGACGTC 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A> LENGTH: 3699 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: - 

<B) LOCATION: 1..3699 

(D) OTHER INFORMATION: /note= "pBSGFP" 



4740 

4800 

4860 

4920 

4980 

5040 

5100 

5160 

5220 

5280 

5340 

5400 

5460 

5520 

5580 

5640 

5700 

5760 

5820 

5880 

5940 

6000 

6060 

6120 

6180 

6238 
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(Xi) SEQUENCE DESCRIPTION; SEQ ID NO: 6: 

GGAAATTGTA AACGTTAATA TTTTGTTAAA ATTCGCGTTA AATTTTTGTT AAATCAGCTC 60 

ATTTTTTAAC CAATAGGCCG AAATCGGCAA AATCCCTTAT AAATCAAAAG AATAGACCGA 120 

GATAGGGTTG AGTGTTGTTC CAGTTTGGAA CAAGAGTCCA CTATTAAAGA ACGTGGACTC 180 

CAACGTCAAA GGGCGAAAAA CCGTCTATCA GGGCGATGGC CCACTACGTG AACCATCACC 24 0 

CTAATCAAGT TTTTTGGGGT CGAGGTGCCG TAAAGCACTA AATCGGAACC CTAAAGGGAG 3 00 

CCCCCGATTT AGAGCTTGAC GGGGAAAGCC GGCGAACGTG GCGAGAAAGG AAGGGAAGAA 360 

AGCGAAAGGA GCGGGCGCTA GGGCGCTGGC AAGTGTAGCG GTCACGCTGC GCGTAACCAC 420 

CACACCCGCC GCGCTTAATG CGCCGCTACA GGGCGCGTCG CGCCATTCGC CATTCAGGCT 4 80 

GCGCAACTGT TGGGAAGGGC GATCGGTGCG GGCCTCTTCG CTATTACGCC AGCTGGCGAA 54 0 

AGGGGGATGT GCTGCAAGGC GATTAAGTTG GGTAACGCCA GGGTTTTCCC AGTCACGACG 600 

TTGTAAAACG ACGGCCAGTG AATTGTAATA CGACTCACTA TAGGGCGAAT TGGGTACCGG 660 

GCCCCCCCTC GAGGTCGACG GTATCGATAA GCTTGATGAT CCTTATTTGT ATAGTTCATC 720 

CATGCCATGT GTAATCCCAG CA6CTGTTAC AAACTCAAGA AGGACCATGT GGTCTCTCTT 780 

TTCGTTGGGA TCTTTCGAAA GGGCAGATTG TGTGGACAGG TAATGGTTGT CTGGTAAAAG 640 

GACAGGGCCA TCGCCAATTG GAGTATTTTG TTGATAATGG TCTGCTAGTT GAACGCTTCC 900 

ATCTTCAATG TTGTGTCTAA TTTTGAAGTT AACTTTGATT CCATTCTTTT GTTTGTCTGC 960 

CATGATGTAT ACATTGTGTG AGTTATAGTT GTATTCCAAT TTGTGTCCAA GAATGTTTCC 1020 

ATCTTCTTTA AAATCAATAC CTTTTAACTC GATTCTATTA ACAAGGGTAT CACCTTCAAA 1080 

CTTGACTTCA GCACGTGTCT TGTAGTTCCC GTCATCTTTG AAAAATATAG TTCTTTCCTG 1140 

TACATAACCT TCGGGCATGG CACTCTTGAA AAAGTCATGC CGTTTCATAT GATCCGGGTA 1200 

TCTTGAAAAG CATTGAACAC CATAAGAGAA AGTAGTGACA AGTGTTGGCC ATGGAACAGG 1260 

TAGTTTTCCA GTAGTGCAAA TAAATTTAAG GGTAAGTTTT CCGTATGTTG CATCACCTTC 1320 

ACCCTCTCCA CTGACAGAAA ATTTGTGCCC ATTAACATCA CCATCTAATT CAACAAGAAT 1380 

TGGGACAACT CCAGTGAAGA GTTCTTCTCC TTTGCTAGCC ATTTCTTGCG CGATCGAATT 1440 

CCTGCAGCCC GGGGGATCCA CTAGTTCTAG AGCGGCCGCC ACCGCGGTGG AGCTCCAGCT 1500 

TTTGTTCCCT TTAGTGAGGG TTAATTCCGA GCTTGGCGTA ATCATGGTCA TAGCTGTTTC 1560 

CTGTGTGAAA TTGTTATCCG CTCACAATTC CACACAACAT ACGAGCCGGA AGCATAAAGT 1620 

GTAAAGCCTG GGGTGCCTAA TGAGTGAGCT AACTCACATT AATTGCGTTG CGCTCACTGC 1680 

CCGCTTTCCA GTCGGGAAAC CTGTCGTGCC AGCTGCATTA ATGAATCGGC CAACGCGCGG 1740 

GGAGAGGCGG TTTGCGTATT GGGCGCTCTT CCGCTTCCTC GCTCACTGAC TCGCTGCGCT 1800 

CGGTCGTTCG GCTGCGGCGA GCGGTATCAG CTCACTCAAA GGCGGTAATA CGGTTATCCA 1860 

CAGAATCAGG GGATAACGCA GGAAAGAACA TGTGAGCAAA AGGCCAGCAA AAGGCCAGGA 1920 

ACCGTAAAAA GGCCGCGTTG CTGGCGTTTT TCCATAGGCT CCGCCCCCCT GACGAGCATC 1980 

ACAAAAATCG ACGCTCAAGT CAGAGGTGGC GAAACCCGAC AGGACTATAA AGATACCAGG 204 0 



wo 97/42320 



PCT/US97/07625 



68 









CTCCTGTTCC 


GACCCTGCCG 


1 CTTACCGGAT 


2100 






^ X 1 i w X C^v. 1 X l-o\JvAAoCG 


TGGCGCTTTC 


TCATAGCTCA 


. CGCTGTAGGT 


2160 


5 






AGCTGGGCTG 


TGTGCACGAA 


CCCCCCGTTC 


2220 




AGCCCGACCG 


CTGCGCCTTA TCCGGTAACT 


ATCGTCTTGA 


GTCCAACCCG 


GTAAGACACG 


2280 


10 


ACTTATCGCC 


ACTGGCAGCA GCCACTGGTA 


ACAGGATTAG 


CAGAGCGAGG 


TATGTAGGCG 


2340 


GTGCTACAGA 


GTTCTTGAAG TGGTGGCCTA 


ACTACGGCTA 


CACTAGAAGG 


ACAGTATTTG 


2400 




GTATCTGCGC 


TCTGCTGAAG CCAGTTACCT 


TCGGAAAAAG 


AGTTGGTAGC 


TCTTGATCCG 


2460 




GCAAACAAAC 


CACCGCTGGT AGCGGTGGTT 


TTTTTGTTTG 


CAAGCAGCAG 


ATTACGCGCA 


2520 




GAAAAAAAGG ATCTCAAGAA GATCCTTTGA TCTTTTCTAC GGGGTCTGAC 


GCTCAGTGGA 


2580 


20 


ACGAAAACTC 


ACGTTAAGGG ATTTTGGTCA 


TGAGATTATC 


AAAAAGGATC 


TTCACCTAGA 


2640 


TCCTTTTAAA 


TTAAAAATGA AGTTITAAAT 


CAATCTAAAG 


TATATATGAG 


TAAACTTGGT 


2700 




CTGACAGTTA 


CCAATGCTTA ATCAGTGAGG 


CACCTATCTC 


AGCGATCTGT 


CTATTTCGTT 


2760 


25 


CATCCATAGT 


TGCCTGACTC CCCGTCGTGT 


AGATAACTAC 


GATACGGGAG 


GGCTTACCAT 


2820 




CTGGCCCCAG 


TGCTGCAATG ATACCGCGAG 


ACCCACGCTC 


ACCGGCTCCA 


GATTTATCAG 


2880 


30 


CAATAAACCA 


GCCAGCCGGA AGGGCCGAGC 


GCAGAAGTGG 


TCCTGCAACT 


TTATCCGCCT 


2940 


CCATCCAGTC 


TATTAATTGT TGCCGGGAAG 


CTAGAGTAAG 


TAGTTCGCCA 


GTTAATAGTT 


3000 




TGCGCAACGT 


TGTTGCCATT GCTACAGGCA 


TCGTGGTGTC 


ACGCTCGTCG 


TTTGGTATGG 


3060 




CTTCATTCAG 


CTCCGGTTCC CAACGATCAA 


GGCGAGTTAC 


ATGATCCCCC 


ATGTTGTGCA 


3120 




AAAAAGCGGT 


TAGCTCCTTC GGTCCTCCGA 


TCGTTGTCAG 


AAGTAAGTTG 


GCCGCAGTGT 


3180 


40 


TATCACTCAT 


GGTTATGGCA GCACTGCATA 


ATTCTCTTAC 


TGTCATGCCA 


TCCGTAAGAT 


3240 


GCTTTTCTGT 


GACTGGTGAG TACTCAACCA 


AGTCATTCTG 


AGAATAGTGT 


ATGCGGCGAC 


3300 




CGAGTTGCTC 


TTGCCCGGCG TCAATACGGG 


ATAATACCGC 


GCCACATAGC 


AGAACTTTAA 


3360 


45 


AAGTGCTCAT 


CATTGGAAAA CGTTCTTCCG 


GGCGAAAACT 


CTCAAGGATC 


TTACCGCTGT 


3420 




TGAGATCCAG 


TTCGATGTAA CCCACTCGTG 


CACCCAACTG 


ATCTTCAGCA 


TCTTTTACTT 


3480 


50 


TCACCAGCGT 


TTCTGGGTGA GCAAAAACAG 


GAAGGCAAAA 


TGCCGCAAAA 


AAGGGAATAA 


3540 


GGGCGACACG 


GAAATGTTGA ATACTCATAC 


TCTTCCTTTT 


TCAATATTAT 


TGAAGCATTT 


3600 




ATCAGGGTTA 


TTGTCTCATG AGCGGATACA 


TATTTGAATG 


TATTTAGAAA 


AATAAACAAA 


3660 


55 


TAGGGGTTCC 


GCGCACATTT CCCCGAAAAG 


TGCCACCTG 






3699 




(2) INFORMATION FOR SEQ ID NO: 7: 











(i) SEQUENCE CHARACTERISTICS: 
€0 (A) LENGTH: 6361 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOIXDGY: linear 

65 (ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: - 
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(B) LOCATION: 1..6361 

(D) OTHER INFORMATION: /note«= "pFRED13" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
TTCTCATGTT TGACAGCTTA TCATCGATAA GCTTTAATGC 
TTGCTAACGC AGTCAGGCAC CGTGTATGAA ATCTAACAAT 
CACCGTCACC CTGGATGCTG TAGGCATAGG CTTGGTTATG 
GCGGGATATC CGGATATAGT TCCTCCTTTC AGCAAAAAAC 
GCCCCAAGGG GTTATGCTAG TTATTGCTCA GCG6TGGCAG 
CGGGCTTTGT TAGCAGCCGG ATCCTTATTT GTATAGTTCA 
AGCAGCTGTT ACAAACTCAA GAAGGACCAT GTGGTCTCTC 
AAGGGCAGAT TGTGTGGACA GGTAATGGTT GTCTGGTAAA 
TGGAGTATTT TGTTGATAAT GGTCTGCTAG TTGAACGCTT 
AATTTTGAAG TTAACTTTGA TTCCATTCTT TTGTTTGTCT 
TGAGTTATAG TTGTATTCCA ATTTGTGTCC AAGAATGTTT 
ACCTTTTAAC TCGATTCTAT TAACAAGGGT ATCACCTTCA 
CTTGTAGTTC CCGTCATCTT TGAAAAATAT AGTTCTTTCC 
GGCACTCTTG AAAAAGTCAT GCCGTTTCAT ATGATCCGGG 
ACCATAAGAG AAAGTAGTGA CAAGTGTTGG CCATGGAACA 
AATAAATTTA AGGGTAAGTT TTCCGTATGT TGCATCACCT 
AAATTTGTGC CCATTAACAT CACCATCTAA TTCAACAAGA 
GAGTTCTTCT CCTTTGCTAG CCATATGTAT ATCTCCTTCT 
TCTAGAGGGG AATTGTTATC CGCTCACAAT TCCCCTATAG 
GGATCGAGAT CTCGATCCTC TACGCCGGAC GCATCGTGGC 
GTGCGGTTGC TGGCGCCTAT ATCGCCGACA TCACCGATGG 
TCGGGCTCAT GAGCGCTTGT TTCGGCGTGG GTATGGTGGC 
TGTTGGGCGC CATCTCCTTG CATGCACCAT TCCTTGCGGC 
ACCTACTACT GGGCTGCTTC CTAATGCAGG AGTCGCATAA 
GACACCATCG AATGGCGCAA AACCTTTCGC GGTATGGCAT 
CAATTCAGGG TGGTGAATGT GAAACCAGTA ACGTTATACG 
GTCTCTTATC AGACCGTTTC CCGCGTGGTG AACCAGGCCA 
CGGGAAAAAG TGGAAGCGGC GATGGCGGAG CTGAATTACA 
CAACTGGCGG GCAAACAGTC GTTGCTGATT GGCGTTGCCA 
GCGCCGTCGC AAATTGTCGC GGCGATTAAA TCTCGCGCCG 
GTGGTGTCGA TGGTAGAACG AAGCGGCGTC GAAGCCTGTA 
CTCGCGCAAC GCGTCAGTGG GCTGATCATT AACTATCCGC 



GGTAGTTTAT CACAGTTAAA 60 

GCGCTCATCG TCATCCTCGG 120 

CCGGTACTGC CGGGCCTCTT 180 

CCCTCAAGAC CCGTTTAGAG 24 0 

CAGCCAACTC AGCTTCCTTT 3 00 

TCCATGCCAT GTGTAATCCC 360 

TTTTCGTTGG GATCTTTCGA 420 

AGGACAGGGC CATCGCCAAT 4 80 

CCATCTTCAA TGTTGTGTCT 54 0 

GCCATGATGT ATACATTGTG 600 

CCATCTTCTT TAAAATCAAT 660 

AACTTGACTT CAGCACGTGT 720 

TGTACATAAC CTTCGGGCAT 780 

TATCTTGAAA AGCATTGAAC 840 

GGTAGTTTTC CAGTAGTGCA 900 

TCACCCTCTC CACTGACAGA 960 

ATTGGGACAA CTCCAGTGAA 1020 

TAAAGTTAAA CAAAATTATT 1080 

TGAGTCGTAT TAATTTCGCG 1140 

CGGCATCACC GGCGCCACAG 1200 

GGAAGATCGG GCTCGCCACT 1260 

AGGCCCCGTG GCCGGGGGAC 1320 

GGCGGTGCTC AACGGCCTCA 1380 

GGGAGAGCGT CGAGATCCCG 1440 

GATAGCGCCC GGAAGAGAGT 1500 

ATGTCGCAGA GTATGCCGGT 1560 

GCCACGTTTC TGCGAAAACG 1620 

TTCCCAACCG CGTGGCACAA 1680 

CCTCCAGTCT GGCCCTGCAC 1740 

ATCAACTGGG TGCCAGCGTG 1800 

AAGCGGCGGT GCACAATCTT I860 

TGGATGACCA GGATGCCATT 1920 
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GCTGTGGAAG CTGCCTGCAC TAATGTTCCG 
CCCATCAACA GTATTATTTT CTCCCATGAA 
5 GTCGCATTGG GTCACCAGCA AATCGCGCTG 

CGTCTGCGTC TGGCTGGCTG GCATAAATAT 
GAACGGGAAG GCGACTGGAG TGCCATGTCC 

10 

GAGGGCATCG TTCCCACTGC GATGCTGGTT 
CGCGCCATTA CCGAGTCCGG GCTGCGCGTT 
15 GATACCGAAG ACAGCTCATG TTATATCCCG 

CTGCTGGGGC AAACCAGCGT GGACCGCTTG 
GGCAATCAGC TGTTGCCCGT CTCACTGGTG 

20 

CAAACCGCCT CTCCCCGCGC GTTGGCCGAT 
CGACTGGAAA GCGGGCAGTG AGCGCAACGC 
25 ACCGGGATCT CGACCGATGC CCTTGAGAGC 

GCGGGGCATG ACTATCGTCG CCGCACTTAT 
ACAGGTGCCG GCAGCGCTCT GGGTCATTTT 

30 

GATGATCGGC CTGTCGCTTG CGGTATTCGG 
CACTGGTCCC GCCACCAAAC GTTTCGGCGA 
35 CGACGCGCTG GGCTACGTCT TGCTGGCGTT 

TATGATTCTT CTCGCTTCCG GCGGCATCGG 
GCAGGTAGAT GACGACCATC AGGGACAGCT 

40 

AACTTCGATC ACTGGACCGC TGATCGTCAC 
GAACGGGTTG GCATGGATTG TAGGCGCCGC 
4 5 TCGCGGTGCA TGGAGCCGGG CCACCTCX3AC 

GGATTCACCA CTCCAAGAAT TGGAGCCAAT 
ACCAACCCTT GGCAGAACAT ATCCATCGCG 

50 

ATCTCGGGCA GCGTTGGGTC CTGGCCACGG 
ACCCGGCTAG GCTGGCGGGG TTGCCTTACT 
55 CGAACGTGAA GCGACTGCTG CTGCAAAACG 

TTCGGTTTCC GTGTTTCGTA AAGTCTGGAA 
TTCCGGATCT GCATCGCAGG ATGCTGCTGG 

60 

ACGAAGCGCT GGCATTGACC CTGAGTGATT 
AGTTGTTTAC CCTCACAACG TTCCAGTAAC 
65 GTGAGCATCC TCTCTCGTTT CATCGGTATC 

ACGGAGGCAT CAGTGACCAA ACAGGAAAAA 
AGCCAGACAT TAACGCTTCT GGAGAAACTC 
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GCGTTATTTC 


TTGATGTCTC 


TGACCAGACA 


1980 


GACGGTACGC 


GACTGGGCGT 


GGAGCATCTG 


2040 


TTAGCGGGCC 


CATTAAGTTC 


TGTCTCGGCG 


2100 


CTCACTCGCA 


ATCAAATTCA 


GCCGATAGCG 


2160 


GGTTTTCAAC 


AAACCATGCA 


AATGCTGAAT 


2220 


GCCAACGATC 


AGATGGCGCT 


GGGCGCAATG 


2280 


GGTGCGGATA 


TCTCGGTAGT 


GGGATACGAC 


2340 


CCGTTAACCA 


CCATCAAACA 


GGATTTTCGC 


2400 


CTGCAACTCT 


CTCAGGGCCA 


GGCGGTGAAG 


2460 


AAAAGAAAAA 


CCACCCTGGC 


GCCCAATACG 


2520 


TCATTAATGC 


AGCTGGCACG 


ACAGGTTTCC 


2580 


AATTAATGTA 


AGTTAGCTCA 


CTCATTAGGC 


2640 


CTTCAACCCA 


GTCAGCTCCT 


TCCGGTGGGC 


2700 


GACTGTCTTC 


TTTATCATGC 


AACTCGTAGG 


2760 


CGGCGAGGAC 


CGCTTTCGCT 


GGAGCGCGAC 


2820 


AATCTTGCAC 


GCCCTCGCTC 


AAGCCTTCGT 


2880 


GAAGCAGGCC 


ATTATCGCCG 


GCATGGCGGC 


2940 


CGCGACGCGA 


GGCTGGATGG 


CCTTCCCCAT 


3000 


GATGCCCGCG 


TTGCAGGCCA 


TGCTGTCCAG 


3060 


TCAAGGATCG 


CTCGCGGCTC 


TTACCAGCCT 


3120 


GGCGATTTAT 


GCCGCCTCGG 


CGAGCACATG 


3180 


CCTATACCTT 


GTCTGCCTCC 


CCGCGTTGCG 


3240 


CTGAATGGAA 


GCCGGCGGCA 


CCTCGCTAAC 


3300 


CAATTCTTGC 


GGAGAACTGT 


GAATGCGCAA 


3360 


TCCGCCATCT 


CCAGCAGCCG 


CACGCGGCGC 


3420 


GTGCGCATGA 


TCGTGCTCCT 


GTCGTTGAGG 


3480 


GGTTAGCAGA 


ATGAATCACC 


GATACGCGAG 


3540 


TCTGCGACCT 


GAGCAACAAC 


ATGAATGGTC 


3600 


ACGCGGAAGT 


CAGCGCCCTG 


CACCATTATG 


3660 


CTACCCTGTG 


GAACACCTAC 


ATCTGTATTA 


3720 


TTTCTCTGGT 


CCCGCCGCAT 


CCATACCGCC 


3780 


CGGGCATGTT 


CATCATCAGT 


AACCCGTATC 


3840 


ATTACCCCCA 


TGAACAGAAA 


TCCCCCTTAC 


3900 


ACCGCCCTTA 


ACATGGCCCG 


CTTTATCAGA 


3960 


AACGAGCTGG 


ACGCGGATGA 


ACAGGCAGAC 


4020 
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ATCTGTGAAT CGCTTCACGA CCACGCTGAT GAGCTTTACC GCAGCTGCCT CGCGCXSTTTC 
GGTGATGACG GTGAAAACCT CTGACACATG CAGCTCCCGG AGACGGTCAC AGCTTGTCTG 
TAAGCGGATG CCGGGAGCAG ACAAGCCCGT CAGGGCGCGT CAGCGGGTGT TGGCGGGTCT 
CGGGGCGCAG CCATGACCCA GTCACGTAGC GATAGCGGAG TCTATACTCG CTTAACTATG 
CGGCATCAGA GCAGATTGTA CTGAGAGTGC ACCATATATG CGGTGTGAAA TACCGCACAG 
AT6CGTAAGG AGAAAATACC GCATCAGGCG CTCTTCCGCT TCCTCGCTCA CTGACTCGCT 
GCGCTCGGTC GTTCGGCTGC GGCGAGCGGT ATCAGCTCAC TCAAAGGCGG TAATACGGTT 
ATCCACAGAA TCAGGGGATA ACGCAGGAAA GAACATGTGA GCAAAAGGCC AGCAAAAGGC 
CAGGAACCGT AAAAAGGCOG CGTTGCTGGC GTTTTTCCAT AGGCTCCGCC CCCCTGACGA 
GCATCACAAA AATCGACGCT CAAGTCAGAG GTGGCGAAAC CCGACAGGAC TATAAAGATA 
CCAGGCGTTT CCCCCTGGAA GCTCCCTCGT GCGCTCTCCT GTTCCGACCC TGCCGCTTAC 
CGGATACCTG TCCGCCTTTC TCCCTTCGGG AAGCGTGGCG CTTTCTCATA GCTCACGCTG 
TAGGTATCTC AGTTCGGTGT AGGTCGTTCG CTCCAAGCTG GGCTGTGTGC ACGAACCCCC 
CGTTCAGCCC GACCGCTGCG CCTTATCCGG TAACTATCGT CTTCAGTCCA ACCCGGTAAG 
ACACGACTTA TCGCCACTGG CAGCAGCCAC TGGTAACAGG ATTAGCAGAG CGAGGTATCT 
AGGCGGTGCT ACAGAGTTCT TGAAGTGGTG GCCTAACTAC GGCTACACTA GAAGGACAGT 
ATTTGGTATC TGCGCTCTGC TGAAGCCAGT TACCTTCGGA AAAAGAGTTG GTAGCTCTTG 
ATCCGGCAAA CAAACCACCG CTGGTAGCGG TGGTTrTTTT GTTTCCAAGC AGCAGATTAC 
GCGCAGAAAA AAAGGATCTC AAGAAGATCC TTTGATCTTT TCTACGGGGT CTGACGCTCA 
GTGGAACGAA AACTCACGTT AAGGGATTTT GGTCATGAGA TTATCAAAAA GGATCTTCAC 
CTAGATCCTT TTAAATTAAA AATGAAGTTT TAAATCAATC TAAAGTATAT ATCAGTAAAC 
TTGGTCTGAC AGTTACCAAT GCTTAATCAG TGAGGCACCT ATCTCAGCGA TCTGTCTATT 
TCGTTCATCC ATAGTTGCCT GACTCCCCGT CGTGTAGATA ACTACGATAC GGGAGGGCTT 
ACCATCTGGC CCCAGTGCTC CAATGATACC GCGAGACCCA CGCTCACCGG CTCCAGATTT 
ATCAGCAATA AACCAGCCAG CCGGAAGGGC CGAGCGCAGA AGTGGTCCTG CAACTTTATC 
CGCCTCCATC CAGTCTATTA ATTGTTGCCG GGAAGCTAGA GTAAGTAGTT CGCCAGTTAA 
TAGTTTGCGC AACGTTGTTG CCATTGCTGC AGGCATCGTG GTGTCACGCT CGTCGTTTGG 
TATGGCTTCA TTCAGCTCCG GTTCCCAACG ATCAAGGCGA GTTACATCAT CCCCCATCTT 
GTGCAAAAAA GCGGTTAGCT CCTTCGGTCC " TCCQATCGTT GTCAGAAGTA AGTTGGCCGC 
AGTGTTATCA CTCATGGTTA TGGCAGCACT GCATAATTCT CTTACTGTCA TCCCATCCGT 
AAGATGCTTT TCTGTGACTG GTGAGTACTC AACCAAGTCA TTCTCAGAAT AGTCTATGCG 
GCGACCGAGT TGCTCTTGCC CGGCGTCAAC ACGGGATAAT ACCGCGCCAC ATAGCAGAAC 
TTTAAAAGTG CTCATCATTG GAAAACGTTC TTCGGGGCGA AAACTCTCAA GGATCTTACC 
GCTGTTGAGA TCCAGTTCGA TGTAACCCAC TCGTGCACCC AACTGATCTT CAGCATCTTT 
TACTTTCACC AGCGTTTCTG GGTGAGCAAA AACAGGAAGG CAAAATGCCG CAAAAAAGGG 



4080 

4140 

4200 

4260 

4320 

4380 

4440 

4500 

4560 

4620 

4680 

4740 

4800 

4660 

4920 

4980 

5040 

5100 

5160 

5220 

5280 

5340 

5400 

5460 

5520 

5580 

5640 

5700 

5760 

5820 

5880 

5940 

6000 

6060 

6120 
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30 



35. 



72 

AATAAGGGCG ACACGGAAAT GTTGAATACT CATACTCTTC CTTTTTCAAT ATTATTGAAG 6180 

CATTTATCAG GGTTATTGTC TCATGAGCGG ATACATATTT GAATGTATTT AGAAAAATAA 6240 

ACAAATAGGG GTTCCGCGCA CATTTCCCCG AAAAGTGCCA CCTGACGTCT AAGAAACCAT 6300 

TATTATCATG ACATTAACCT ATAAAAATAG GCGTATCACG AGGCCCTTTC GTCTTCAAGA 6360 
A 



(2) INFORMATION FOR SEQ ID NO: 8; 



(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

20 (ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: - 
25 (B) LOCATION: 1..48 

(D) OTHER INFORMATION: /noce= "oligonucleotide #17422* 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 4 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
40 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



6361 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:8: 
CAATTTGTGT CCCAGAATGT TGCCATCTTC CTTGAAGTCA ATACCTTT 4 8 

(2) INFORMATION FOR SEQ ID NO: 9: 



45 
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(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1, .47 

(D) OTHER INFORMATION: /note= "oligonucleotide #17423" 



(xi) SEQUENCE DESCRIPTION; SEQ ID NO; 9: 
GTCTTGTAGT TGCCGTCATC TTTGAAGAAG ATGCTCCTTT CCTGTAC 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 52 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1 . . 52 

(D) OTHER INFORMATION: /note= "oligonucleotide- #17424" 



<Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CATGGAACAG GCAGTTTGCC AGTAGTGCAG ATGAACTTCA GGGTAAGTTT TC 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 40 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..40 

(D) OTHER INFORMATION: /note=: "oligonucleotide »17425" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CTCCACTGAC AGAGAACTTG TGGCCGTTAA CATCACCATC 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH; 4 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS; single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 
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(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1. .47 

(D) OTHER INFORMATIOK: /note= "oligonucleotide #17426" 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
CCATCTTCAA TGTTGTGGCG GGTCTTGAAG TTCACTTTGA TTCCATT 47 

(2) INFORMATION FOR SEQ ID NO: 13: 



(i> SEQUENCE CHARACTERISTICS; 
15 (A) LENGTH: 41 base pairs 

<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

20 (ii) MOLECULE TYPE: ONA 

(ix) FEATURE: 

(A) NAME/KEY: - 
25 (B) LOCATION: 1..41 

(D) OTHER INFORMATION: /note= "oligonucleotide #17465" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
CGATAAGCTT GAGGATCCTC AGTTGTACAG TTCATCCATG C 41 

(2) INFORMATION FOR SEQ ID NO: 14: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 849 base pairs 
<B) TYPE: nucleic acid 
(C) STRANDEDNESS: single 
4 0 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



45 (ix) FEATURE: 

(A) NAME/ KEY: - 

(B) LOCATION: 1. .84 9 

(D) OTHER INFORMATION: /note= "pBSGFPsgil' 

50 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

ATGACCATGA TTACGCCAAG CTCGGAATTA ACCCTCACTA AAGGGAACAA AAGCTGGAGC 60 

55 TCCACCGCGG TGGCGGCCGC TCTAGAACTA GTGGATCCCC CGGGCTGCAG GAATTCX3ATC 120 

GC(3CAAGAAA TGGCTAGCAA AGGAGAAGAA CTCTTCACTG GAGTTGTCCC AATTCTTGTT 180 

GAATTAGATG GTGATGTTAA CGGCCACAAG TTCTCTGTCA GTGGAGA<3GG TGAAGGTGAT 240 

GCAACATACG GAAAACTTAC CCTOAAGTTC ATCTTGCACTA CTGGCAAACT GCtrTGTTCCA 300 

TGGCCAACAC TTGTCACTAC TCTCTCTTAT GGTGTTCAAT GCTTTTCAAG ATACCCGGAT 360 

65 CATATGAAAC GGCATGACTT TTTCAAGAGT GCCATGCCCG AA(3GTTATGT ACA<3GAAAGG 420 

ACCATCTTCT TCAAAGATGA CGGCAACTAC AAGACACGTG CTGAAGTCAA GTTTGAAGGT 4 80 

GATACCCTTG TTAATAGAAT CGAGTTAAAA GGTATTGACT TCAAC^GAAGA TGGCAACATT 540 
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CTGGGACACA AATTGGAATA CAACTATAAC TCACACAATG TATACATCAT GGCAGACAAA 
CAAAAGAATG GAATCAAAGT GAACTTCAAG ACCCGCCACA ACATTGAAGA TGGAAGCGTT 
CAACTAGCAG ACCATTATCA ACAAAATACT CCAATTGGCG ATGGCCCTGT CCTTTTACCA 
GACAACCATT ACCTGTCCAC ACAATCTGCC CTTTCGAAAG ATCCCAACGA AAAGAGAGAC 780 
CACATGGTCC TTCTTGAGTT TGTAACAGCt GCTGGGATTA CACATGGCAT GGATGAACTG 840 
TACAACTGA 



(2) INFORMATION FOR SEQ ID NO: 15: 



600 
660 
720 



849 



(i) SEQUENCE CHARACTERISTICS: 

(AJ LENGTH: 720 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

25 (ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..720 

(D) OTHER INFORMATION: /note=i "SG12" 

30 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: IS: 

ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 60 

35 GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 120 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 180 

CTTGTCACTA CTCTCTCTTA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 240 

CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 300 

TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 360 

45 GTTAATAGAA TCGAGTTAAA AGGTATTGAT TTTAAAGAAG ATGGAAACAT TCTTGGACAC 420 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 480 

GGAATCAAAG TTAACTTCAA AATTAGACAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 540 

GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 600 

TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 660 

55 CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT ATACAAATAA 720 

(2) INFORMATION FOR SEQ ID NO: 16: 

^0 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 
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(A) NAME/KEY: - 

(B) LOCATION: 1..720 

(D) OTHER INFORMATION: /note= "SGll" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 60 

10 GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 120 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 180 

CTTGTCACTA CTCTCTCTTA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 240 

CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 300 

TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 360 

20 GTTAATAGAA TCGAGTTAAA AGGTATTGAC TTCAAGGAAG ATGGCAACAT TCTGGGACAC 420 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 480 

GGAATCAAAG TGAACTTCAA GACCCGCCAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 54 0 

GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 600 

TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 660 

30 CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT GTACAACTGA 720 

(2) INFORMATION FOR SEQ ID NO: 17: 

35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 720 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 
4 5 (A) NAME/KEY: - 

(B) LOCATION: 1..720 

(D) OTHER INFORMATION; /note= "SG2 5" 

50 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

ATGGCTAGCA AAGQAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 60 

GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 120 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 180 

CTAGTCACTA CTCTGTGCTA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 240 

60 CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 300 

TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 360 

GTTAATAGAA TCGAGTTAAA AGGTATTGAC TTCAAGGAAG ATGGCAACAT TCTGGGACAC 420 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 480 

GGAATCAAAG TGAACTTCAA GACCCGCCAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 540 
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GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 6 

TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 6< 

CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT GTACAACTGA 7: 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 4 0 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1. .40 

(D) OTHER INFORMATION: /note- "oligonucleotide #18217" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CATTGAACAC CATAGCACAG AGTAGTGACT AGTGTTGGCC 4 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CmU^ACTERISTICS : 

(A) LENGTH: 72 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1,.720 

(D) OTHER INFORMATION: /note= "SB42" 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 60 

GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 120 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 180 

CTAGTCACTA CTCTCTCTCA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 240 

CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 300 

TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 360 

GTTAATAGAA TCGAGTTAAA AGGTATTGAT TTTAAAGAAG ATGGAAACAT TCTTGGACAC 420 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 480 

GGAATCAAAG TTAACTTCAA AATTAGACAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 54 0 

GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 600 

TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 660 
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CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT ATACAAATAA 720 
(2) INFORMATION FOR SEQ ID NO: 20: 

5 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
10 (D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: DNA 

15 (ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..40 

(D) OTHER INFORMATION: /note= "oligonucleotide #bio25" 



20 



25 



35 



50 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 
CATTGAACAC CATGAGAGAG AGTAGTGACT AGTGTTGGCC 4 0 

(2) INFORMATION FOR SEQ ID NO: 21: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 
30 <B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 



<ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1,,720 

40 (D) OTHER INFORMATION: /note= "SB49" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 

45 ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 60 

GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 120 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA IBO 

CTAGTCACTA CTTTCTCTCA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 240 

CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 300 

55 TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 360 

GTTAATAGAA TCGAGTTAAA AGGTATTGAT TTTAAAGAAG ATGGAAACAT TCTTGGACAC 420 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 480 

60 

GGAATCAAAG CGAACTTCAA GATCCGCCAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 540 

GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 600 

65 TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 660 

CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT ATACAAATAA 720 
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(2) INFORMATION FOR SEQ ID N0:22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

<A) NAME/KEY: - 

(B> LOCATION: 1..44 

(D) OTHER INFORMATION: /note* "oligonucleotide #19059" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
CTTCAATGTT GTGGCGGATC TTGAAGTTCG CTTTGATTCC ATTC 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..40 

(D) OTHER INFORMATION: /note= "oligonucleotide »bio24" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 
CATTGAACAC CATGAGAGAA AGTAGTGACT AGTGTTGGCC 



(2) INFORMATION FOR SEQ ID NO:24: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 
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(ix) FEATURE: 

(A) NAME/KEY: - 
(B> LOCATION: 1..720 

(D) OTHER INFORMATION: /note=: **SB50*' 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 60 

GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 120 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 180 

15 CTAGTCACTA CTCTCTCTCA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 240 

CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 300 

TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 360 

20 

GTTAATAGAA TCGAGTTAAA AGGTATTGAT TTTAAAGAAG ATGGAAACAT TCTTGGACAC 420 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 480 

25 GGAATCAAAG CGAACTTCAA GATCCGCCAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 540 

GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 600 

TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 660 

CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT ATACAAATAA 720 



30 



35 



<2) INFORMATION FOR SEQ ID NO: 25: 



<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1521 base paixs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
4 0 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

45 (ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1. .1521 

(D) OTHER INFORMATION: /note= "pCMVgfoll" 

50 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2S: 
ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 60 
55 GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 120 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 180 
CTTGTCACTA CTCTCTCTTA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 24 0 

60 

CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 300 
TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 360 
65 GTTAATAGAA TCGAGTTAAA AGGTATTGAC TTCAAGGAAG ATGGCAACAT TCTGGGACAC 4 20 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 4 80 

GGAATCAAAG TGAACTTCAA GACCCGCCAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 54 0 
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C5ACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 600 

TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 660 

CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT GTACAACGGT 720 

GCTGGTGCTA TCGAACAAGA TGGATTGCAC GCAGGTTCTC CGGCCGCTTG GGTGGAGAGG 780 

CTATTCGGCT ATGACTGGGC ACAACAGACA ATCGGCTGCT CTGATGCCGC CGTGTTCCGG 840 

CTGTCAGCGC AGGGGCGCCC GGTTCTTTTT GTCAAGACCG ACCTGTCCGG TGCCCTGAAT 900 

GAACTGCAGG ACGAGGCAGC GCGGCTATCG TGGCTGGCCA CGACGGGCGT TCCrTGCGCA 960 

GCTGTGCTCG ACGTTGTCAC TGAAGCGGGA AGGGACTGGC TGCTATTGGG CGAAGTGCCG 1020 

GGGCAGGATC TCCTGTCATC TCACCTTGCT CCTGCCGAGA AAGTATCCAT CATGGCTCAT 1080 

GCAATGCGGC GGCTGCATAC GCTTGATCCG GCTACCTGCC CATTCGACCA CCAAGCGAAA 1140 

CATCGCATCG AGCGAGCACG TACTCGGATG GAAGCCGGTC TTGTCGATCA GGATGATCTG 1200 

GACGAAGAGC ATCAGGGGCT CGCGCCAGCC GAACTGTTCG CCAGGCTCAA GGCGCGCATG 1260 

CCCGACGGCG AGGATCTCGT CGTGACCCAT GGCGATGCCT GCTTGCCGAA TATCATGGTG 1320 

GAAAATGGCC GCTTTTCTGG ATTCATCGAC TGTGGCCGGC TGGGTGTGGC GGACCGCTAT 1380 

CAGGACATAG CGTTGGCTAC CCGTGATATT GCTGAAGAGC TTGGCGGCGA ATGGGCTGAC 1440 

CGCTTCCTCG TGCTTTACGG TATCGCCGCT CCCGATTCGC AGCGCATCGC CTTCTATCGC 1500 
CTTCTTGACG AGTTCTTCTG A 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

Gly Ala Gly Ala 
1 

(2) INFORMATION FOR SEQ ID MO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 2 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



1521 
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(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..32 

(D) OTHER INFORMATION: /note* "primer BioSl" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 
CGCGGATCCT TCGAACAAGA TGGATTGCAC GC 32 

(2) INFORMATION FOR SEQ ID NO: 28: 



(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

20 (ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

<A) NAME/KEY: - 
25 <B) LOCATION: 1..34 

(D) OTHER INFORMATION: /note= "primer BioS2" 



<xi) SEQUENCE DESCRIPTION: SEQ ID N0:2e: 
CCGGAATTCT CAGAAGAACT CGTCAAGAAG GCGA 34 

(2) INFORMATION FOR SEQ ID NO:29: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
40 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

45 (ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..46 

(D) OTHER INFORMATION: /note= "primer Bio49" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
GGCGCGCAAG AAATGGCTAG CAAAGGAGAA GAACTCTTCA CTGGAG 46 

(2) INFORMATION FOR SEQ ID NO: 30: 



<i) SEQUENCE aiARACTERISTICS : 
(A) LENGTH: 46 base pairs 
60 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



65 



(ii) MOLECULE TYPE: DNA 
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(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1. .46 

(D) OTHER INFORMATION: /note= "primer BioSO" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: 
CCCATCGATA GCACCAGCAC CGTTGTACAG TTCATCCATG CCATGT 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1521 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..1521 

(D) OTHER INFORMATION: /note* "pPGKgfo2S« 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 60 
GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 120 
GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 180 
CTAGTCACTA CTCTGTGCTA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 240 
CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 300 
TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 
GTTAATAGAA TCGAGTTAAA AGGTATTGAC TTCAAGGAAG ATGGCAACAT TCTGGGACAC 
AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 
GGAATCAAAG TGAACTTCAA GACCCGCCAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 
GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 600 
TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 660 
CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT GTACAACGGT 
GCTGGTGCTA TCGAACAAGA TGGATTGCAC GCAGGTTCTC CGGCCGCTTG GGTGGAGAGG 
CTATTCGGCT ATGACTGGGC ACAACAGACA ATCGGCTGCT CTGATGCCGC CGTGTTCCGG 840 
CTGTCAGCGC AGGGGCGCCC GGTTCTTTTT GTCAAGACCG ACCTGTCCGG TGCCCTGAAT 
GAACTGCAGG ACGAGGCAGC GCGGCTATCG TGGCTGGCCA CGACGGGCGT TCCTTGCGCA 
GCTGTGCTCG ACGTTGTCAC TGAAGCGGGA AGGGACTGGC TGCTATTGGG CGAAGTGCCG 
GGGCAGGATC TCCTGTCATC TCACCTTGCT CCTGCCGAGA AAGTATCCAT CATGGCTGAT 
GCAATGCGGC GGCTGCATAC GCTTGATCCG GCTACCTGCC CATTCGACCA CCAAGCGAAA 
CATCGCATCG AGCGAGCACG TACTCGGATG GAAGCCGGTC TTGTCGATCA GGATGATCTG 1200 



360 
420 
480 
540 



720 
780 



900 
960 
1020 
1080 
1140 
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GACGAAGAGC ATCAGGGGCT CGCGCCAGCC GAACTGTTCG CCAGGCTCAA GGCGCX5CATG 1260 

CCCGACGGCG AGGATCTCGT CGTGACCCAT GGCX3ATGCCT GCTTGCCGAA TATCATGGTG 1320 

GAAAATGGCC GCTTTTCTGG ATTCATCX3AC TGTGGCCGGC TGGGTGTGGC GGACCGCTAT 13 60 

CAGGACATAG CGTTGGCTAC CCGTGATATT GCTGAAGAGC TTGGCGGCGA ATGGGCTGAC 1440 

CGCTTCCTCG TGCTTTACGG TATCGCCGCT CCCGATTCGC AGCGCATCGC CTTCTATCGC 1500 

CTTCTTGACG AGTTCTTCTG A 152 1 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..26 

(D) OTHER INFORMATION: /noce= •* oligonucleotide #18990" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
GACCGGGACA CGTATCCAGC CTCCGC 26 

<2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

Ux) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..28 

(D) OTHER INFORMATION: /note= "oligonucleotide »18991" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
GGAGGCTGGA TACGTGTCCC GGTCTGCA 28 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7617 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 
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(A) NAME/KEY: - 

(B) LOCATION: 1..7617 

(D) OTHER IMFORMATION: /note= "pGen-PGKgfo25RO" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
TCGAGGTCGA CGGTATCGAT TAGTCCAATT TGTTAAAGAC AGGATATCAG TGGTCCAGGC 
TCTAGTTTTG ACTCAACAAT ATCACCAGCT GAAGCCTATA GAGTACGAGC CATAGATAAA 
ATAAAAGATT TTATTTAGTC TCCAGAAAAA GGGGGGAATG AAAGACCCCA CCTGTAGGTT 
TGGCAAGCTA GCTTAAGTAA CGCCATTTTG CAAGGCATGG AAAAATACAT AACTGAGAAT 
AGAGAAGTTC AGATCAAGGT CAGGAACAGA TGGAACAGCT GAATATGGGC CAAACAGGAT 
ATCTGTGGTA AGCAGTTCCT GCCCCGGCTC AGGGCCAAGA ACAGATGGAA CAGCTGAATA 
TGGGCCAAAC AGGATATCTG TGGTAAGCAG TTCCTGCCCC GGCTCAGGGC CAAGAACAGA 
TGGTCCCCAG ATGCGGTCCA GCCCTCAGCA GTTTCTAGAG AACCATCAGA TGTTTCCAGG 
GTGCCCCAAG GACCTGAAAT GACCCTGTGC CTTATTTGAA CTAACCAATC AGTTCGCTTC 
TCGCTTCTGT TCGCGCGCTT CTGCTCCCCG AGCTCAATAA AAGAGCCCAC AACCCCTCAC 
TCGGGGCGCC AGTCCTCCGA TTGACTGAGT CGCCCGGGTA CCCGTGTATC CAATAAACCC 
TCTTGCAGTT GCATCCGACT TGTGGTCTCG CTGTTCCTTG GGAGGGTCTC CTCTGAGTGA 
TTGACTACCC GTCAGCGGGG GTCTTTCATT TGGGGGCTCG TCCGGGATCG GGAGACCCCT 
GCCCAGGGAC CACCGACCCA CCACCGGGAG GTAAGCTGGC CAGCAACTTA TCTGTGTCTG 
TCCGATTGTC TAGTGTCTAT GACTGATTTT ATGCGCCTGC GTCGGTACTA GTTAGCTAAC 
TAGCTCTGTA TCTGGCGGAC CCGTGGTGGA ACTGACGAGT TCGGAACACC CGGCCGCAAC 
CCTGGGAGAC GTCCCAGGGA CTTCGGGGGC CGTTTTTGTG GCCCGACCTG AGTCCAAAAA 
TCCCGATCGT TTTGGACTCT TTGGTGCACC CCCCTTAGAG GAGGGATATG TGGTTCTGGT 
AGGAGACGAG AACCTAAAAC AGTTCCCGCC TCCGTCTGAA TTTTTGCTTT CGGTTTGGGA 
CCGAAGCCGC GCCGCGCGTC TTGTCTGCTG CAGCATCGTT CTGTGTTGTC TCTGTCTGAC 
TGTGTTTCTG TATTTGTCTG AGAATATGGG CCAGACTGTT ACCACTCCCT TAAGTTTGAC 
CTTAGGTCAC TGGAAAGATG TCGAGCGGAT CGCTCACAAC CAGTCGGTAG ATGTCAAGAA 
GAGACGTTGG GTTACCTTCT GCTCTGCAGA ATGGCCAACC TTTAACGTCG GATCGCCGCG 
AGACGGCACC TTTAACCGAG ACCTCATCAC CCAGGTTAAG ATCAAGGTCT TTTCACCTCG 
CCCGCATGGA CACCCAGACC AGGTCCCCTA CATCGTGACC TGGGAAGCCT TGGCTTTTGA 
CCCCCCTCCC TGGGTCAAGC CCTTTGTACA CCCTAAGCCT CCGCCTCCTC TTCCTCCATC 
CGCCCCGTCT CTCCCCCTTG AACCTCCTCG TTCGACCCCG CCTCGATCCT CCCTTTATCC 
AGCCCTCACT CCTTCTCGAC GGTATACAGA CATGATAAGA TACATTGATG AGTTTGGACA 
AACCACAACT AGAATGCAGT GAAAAAAATG CTTTATTTGT GAAATTTGTG ATGCTATTGC 
TTTATTTGTA ACCATTATAA GCTGCAATAA ACAAGTTGGG GTGGGCGAAG AACTCCAGCA 
TGAGATCCCC GCGCTGGAGG ATCATCCAGC CGGCGAACGT GGCGAGAAAG GAAGGGAAGA 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
' 1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1660 
1740 
1800 
1860 
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AAGCGAAAGG AGCGGGCGCT AGGGCGCTGG 
CCACACCCGC CGCGCTTAAT GCGCCGCTAC 
5 AGCTGGTTCT TTCCGCCTCA GAAGCCATAG 

TGTCTTCCCA ATCCTCCCCC TTGCTGTCCT 
ACCTACTCAG ACAATGCGAT GCAATTTCCT 

10 

CACCTTCCAG GGTCAAGGAA GGCACGGGGG 
AAGGCACAGT CGAGGCTGAT CAGCGAGCTC 
15 CCTCTAGATG CATGCTCGAG CGGCCGCCAG 

AACTCGTCAA GAAGGCGATA GAAGGCGATG 
AGCACGAGGA AGCGGTCAGC CCATTCGCCG 

20 

AACGCTATGT CCTGATAGCG GTCCGCCACA 
AAGCXK3CCAT TTTCCACCAT GATATTCGGC 
25 TCCTCGCCGT CGGGCATGCG CGCCTTGAGC 

TGATGCTCTT CGTCCAGATC ATCCTGATCG 
CGCTCGATGC GATGTTTCGC TTGGTGGTCG 

30 

AGCCGCCGCA TTGCATCAGC CATGATGGAT 
AGGAGATCCT GCCCCGGCAC TTCGCCCAAT 
35 ACGTCGAGCA CAGCTGCGCA AGGAACGCCC 

TCGTCCTGCA GTTCATTCAG GGCACCGGAC 
CCCTGCGCTG ACAGCCGGAA CACGGCGGCA 

40 

TCATAGCCGA ATAGCCTCTC CACCCAAGCG 

TCGATAGCAC CAGCACCGTT GTACAGTTCA 

45 ACAAACTCAA GAAGGACCAT GTGGTCTCTC 

TGTGTGGACA GGTAATGGTT GTCTGGTAAA 

TGTTGATAAT GGTCTGCTAG TTGAACGCTT 

50 

TTCACTTTGA TTCCATTCTT TTGTTTGTCT 

TTGTATTCCA ATTTGTGTCC CAGAATGTTG 

55 TCGATTCTAT TAACAAGGGT ATCACCTTCA 

CCGTCATCTT TGAAGAAGAT GGTCCTTTCC 

AAAAAGTCAT GCCGTTTCAT ATGATCCGGG 

60 

AGAGTAGTGA CTAGTGTTGG CCATGGAACA 
AGGGTAAGTT TTCCGTATGT TGCATCACCT 
65 CCGTTAACAT CACCATCTAA TTCAACAAGA 

CCTTTGCTAG CCATTTCTTG CGCGCCCGCG 
CGAAAGGCCC GGAGATGAGG AAGAGGAGAA 



PCT/US97/07625 



86 



CAAGTGTAGC 


GGTCACGCTG 


CGCGTAACCA 


1920 


AGGGCGCGTG 


GGGATACCCC 


CTAGAGCCCC 


1980 


AGCCCACCGC 


ATCCCCAGCA 


TGCCTGCTAT 


2040 


GCCCCACCCC 


ACCCCCCAGA 


ATAGAATGAC 


2100 


CATTTTATTA 


GGAAAGGACA 


GTGGGAGTGG 


2160 


AGGGGCAAAC 


AACAGATGGC 


TGGCAACTAG 


2220 


TAGCATTTAG 


GTGACACTAT 


AGAATAGGGC 


2280 


TGTGATGGAT 


ATCTGCAGAA 


TTCTCAGAAG 


2340 


CGCTGCGAAT 


CGGGAGCGGC 


GATACCGTAA 


2400 


CCAAGCTCTT 


CAGCAATATC 


ACGGGTAGCC 


2460 


CCCAGCCGGC 


CACAGTCGAT 


GAATCCAGAA 


2520 


AAGCAGGCAT 


CGCCATGGGT 


CACGACGA6A 


2580 


CTGGCGAACA 


GTTCGGCTGG 


CGCGAGCCCC 


2640 


ACAAGACCGG 


CTTCCATCCG 


AGTACGTGCT 


2700 


AATGGGCAGG 


TAGCCGGATC 


AAGCGTATGC 


2760 


ACTTTCTCGG 


CAGGAGCAAG 


GTGAGATGAC 


2820 


AGCAGCCAGT 


CCCTTCCCGC 


TTCAGTGACA 


2880 


GTCGTGGCCA 


GCCACGATAG 


CCGCGCTGCC 


2940 


AGGTCGGTCT 


TGACAAAAAG 


AACCGGGCGC 


3000 


TCAGAGCAGC 


CGATTGTCTG 


TTGTGCCCAG 


3060 


GCCGGAGAAC 


CTGCGTGCAA 


TCCATCTTGT 


3120 


TCCATGCCAT 


GTGTAATCCC 


AGCAGCTGTT 


3180 


TTTTCGTTGG 


GATCTTTCGA 


AAGGGCAGAT 


3240 


AGGACAGGGC 


CATCGCCAAT 


TGGAGTATTT 


3300 


CCATCTTCAA 


TGTTGTGGCG 


GGTCTTGAAG 


3360 


GCCATGATGT 


ATACATTGTG 


TGAGTTATAG 


3420 


CCATCTTCCT 


TGAAGTCAAT 


ACCnT'i'AAC 


3480 


AACTTGACTT 


CAGCACGTGT 


CTTGTAGTTG 


3540 


TGTACATAAC 


CTTCGGGCAT 


GGCACTCTTG 


3600 


TATCTTGAAA 


AGCATTGAAC 


ACCATAGCAC 


3660 


GGCAGTTTGC 


CAGTAGTGCA 


GATGAACTTC 


3720 


TCACCCTCTC 


CACTGACAGA 


GAACTTGTGG 


3780 


ATTGGGACAA 


CTCCAGTGAA 


GAGTTCTTCT 


3840 


GAGGCTGGAT 


ACGTGTCCCG 


GTCTGCAGGT 


3900 


CAGCGCGGCA 


GACGTGCGCT 


TTTGAAGCGT 


3960 
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GCAGAATGCC GGGCTCCGGA GGACCTTCGC GCCCGCCCCG CCCCTGAGCC CGCCCCTGAG 4020 

CCCGCCCCCG GACCCACCCC TTCCCAGCCT CTGAGCCCAG AAAGCGAAGG AGCCAAGCTG 4080 

CTATTGGCCG CTGCCCCAAA GGCCTACCCG CTTCCATTGC TCAGCGGTGC TGTCCATCTG 414 0 

CACGAGACTA GTGAGACGTG CTACTTCCAT TTGTCACGTC CTGCACGACG CGAGCTGCGG 4200 

GGCGGGGGGG AACTTCCTGA CTAGGGGAGG AGTAGAAGGT GGCGCGAAGG GGCCACCAAA 4260 

GAAGGGAGCC GGTTGGCGCT ACCGGTGGAT GTGGAATGTG TGCGAGGCCA GAGGCCACTT 4320 

GTGTAGCGCC AAGTGCCAGC GGGGCTGCTA AAGCGCATGC TCCAGACTGC CTTGGGAAAA 4380 

GCGCCTCCCC TACCCGGTAG AATTCGATAT CAAGCTTATC GATACCGTCG AGATCTCCCG 4440 

ATCCGTCGAG GTCGACGGTA TCGATTAGTC CAATTTGTTA AAGACAGGAT ATCAGTGGTC 4500 

CAGGCTCTAG TTTTGACTCA ACAATATCAC CAGCTGAAGC CTATAGAGTA CGAGCCATAG 4560 

ATAAAATAAA AGATTTTATT TAGTCTCCAG AAAAAGGGGG GAATGAAAGA CCCCACCTGT 4620 

AGGTTTGGCA AGCTAGCTTA AGTAACGCCA TTTTGCAAGG CATGGAAAAA TACATAACTG 4680 

AGAATAGAGA AGTTCAGATC GGGATCCCAA TTCTTTCGGA CTTTTGAAAG TGATGGTGGT 4740 

GGGGGAAGGA TTCGAACCTT CGAAGTCGAT GACGGCAGAT TTAGAGTCTG CTCCCTTTGG 4800 

CCGCTCGGGA ACCCCACCAC GGGTAATGCT TTTACTGGCC TGCTCCCTTA TCGGGAAGCG 4860 

GGGCGCATCA TATCAAATGA CGCGCCGCTG TAAAGTGTTA CGTTGAGAAA GAATTGGGAT 4920 

CCCGATCAAG GTCAGGAACA GATGGAACAG CTAGAGAACC ATCAGATGTT TCCAGGGTGC 4980 

CCCAAGGACC TGAAATGACC CTGTGCCTTA TTTGAACTAA CCAATCAGTT CGCTTCTCGC 5040 

TTCTGTTCGC GCGCTTCTGC TCCCCGAGCT CAATAAAAGA GCCCACAACC CCTCACTCGG SlOO 

GGCGCCAGTC CTCCGATTGA CTGAGTCGCC CGGGTACCCG TGTATCCAAT AAACCCTCTT 5160 

GCAGTTGCAT CCGACTTGTG GTCTCGCTGT TCCTTGGGAG GGTCTCCTCT GAGTGATTGA 5220 

CTACCCGTCA GCGGGGGTCT TTCACCCAGA GTTTGGAACT TACTGTCTTC TTGGGACCTG 5280 

CAGCCCGGGG GATCCACTAG TTCTAGAGCG GCCGCCACCG CGGTGGATTC TGCCTCGCGC 5340 

GTTTCGGTGA TGACGGTGAA AACCTCTGAC ACATGCAGCT CCCGGAGACG GTCACAGCTT 5400 

GTCTGTAAGC GGATGCCGGG AGCAGACAAG CCCGTCAGGG CGCGTCAGCG GGTGTTGGCG 5460 

GGTGTCGGGG CGCAGCCATG ACCCAGTCAC GTAGCGATAG CGGAGTGTAT ACTGGCTTAA 5520 

CTATGCGGCA TCAGAGCAGA TTGTACTGAG AGTGCACCAT ATGCGGTGTG AAATACCGCA 558 0 

CAGATGCGTA AGGAGAAAAT ACCGCATCAG GCGCTCTTCC GCTTCCTCGC TCACTGACTC 564 0 

GCTGCGCTCG GTCGTTCGGC TGCGGCGAGC GGTATCAGCT CACTCAAAGG CGGTAATACG 5700 

GTTATCCACA GAATCAGGGG ATAACGCAGG AAAGAACATG TGAGCAAAAG GCCAGCAAAA 5760 

GGCCAGGAAC CGTAAAAAGG CCGCGTTGCT GGCGTTTTTC CATAGGCTCC GCCCCCCTGA 5820 

CGAGCATCAC AAAAATCGAC GCTCAAGTCA GAGGTGGCGA AACCCGACAG GACTATAAAG 5880 

ATACCAGGCG TTTCCCCCTG GAAGCTCCCT CGTGCGCTCT CCTGTTCCGA CCCTGCCGCT 5940 

TACCGGATAC CTGTCCGCCT TTCTCCCTTC GGGAAGCGTG GCGCTTTCTC AATGCTCACG 6000 

CTGTAGGTAT CTCAGTTCGG TGTAGGTCGT TCGCTCCAAG CTGGGCTGTG TGCACGAACC 6060 
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CCCCGTTCAG CCCGACCGCT GCGCCTTATC 
AAGACACGAC TTATCGCCAC TGGCAGCAGC 
5 TGTAGGCGGT GCTACAGAGT TCTTGAAGTG 

AGTATTTGGT ATCTGCGCTC TGCTGAAGCC 
TTGATCCGGC AAACAAACCA CCGCTGGTAG 

10 

TACGCGCAGA AAAAAAGGAT CTCAAGAAGA 
TCAGTGGAAC GAAAACTCAC GTTAAGGGAT 
15 CACCTAGATC CTTTTAAATT AAAAATGAAG 

AACTTGGTCT GACAGTTACC AATGCTTAAT 
ATTTCGTTCA TCCATAGTTG CCTGACTCCC 

20 

CTTACCATCT GGCCCCAGTG CTGCAATGAT 
TTTATCAGCA ATAAACCAGC CAGCCGGAAG 
25 ATCCGCCTCC ATCCAGTCTA TTAATTGTTG 

TAATAGTTTG CGCAACGTTG TTGCCATTGC 
TGGTATGGCT TCATTCAGCT CCGGTTCCCA 

30 

GTTGTGCAAA AAAGCGGTTA GCTCCTTCGG 
CGCAGTGTTA TCACTCATGG TTATGGCAGC 
35 CGTAAGATGC TTTTCTGTGA CTGGTGAGTA 

GCGGCGACCG AGTTGCTCTT GCCCGGCGTC 
AACTTTAAAA GTGCTCATCA TTGGAAAACG 

40 

ACCGCTGTTG AGATCCAGTT CGATGTAACC 
TTTTACTTTC ACCAGCGTTT CTG6GTGAGC 
4 5 GGGAATAAGG GCGACACGGA AATGTTGAAT 

AAGCATTTAT CAGGGTTATT GTCTCATGAG 
TAAACAAATA GGGGTTCCGC GCACATTTCC 

50 

CATTATTATC ATGACATTAA CCTATAAAAA 
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CGGTAACTAT 


CGTCTTGAGT 


CCAACCCGGT 


6120 


CACTGGTAAC 


AGGATTAGCA 


GAGCGAGGTA 


6160 


GTGGCCTAAC 


TACGGCTACA 


CTAGAAGGAC 


6240 


AGTTACCTTC 


GGAAAAAGAG 


TTGGTAGCTC 


6300 


CGGTGGTTTT 


TTTGTTTGCA 


AGCAGCAGAT 


6360 


TCCTTTGATC 


TTTTCTACGG 


GGTCTGACGC 


6420 


TTTGGTCATG 


AGATTATCAA 


AAAGGATCTT 


6480 


TTTTAAATCA 


ATCTAAAGTA 


TATATGAGTA 


6540 


CAGTGAGGCA 


CCTATCTCAG 


CGATCTGTCT 


6600 


CGTCGTGTAG 


ATAACTACGA 


TACGGGAGGG 


6660 


ACCGCGAGAC 


CCACGCTCAC 


CGGCTCCAGA 


6720 


GGCCGAGCGC 


AGAAGTGGTC 


CTGCAACTTT 


6780 


CCGGGAAGCT 


AGAGTAAGTA 


GTTCGCCAGT 


6840 


TGCAGGCATC 


GTGGTGTCAC 


GCTCGTCGTT 


6900 


ACGATCAAGG 


CGAGTTACAT 


GATCCCCCAT 


6960 


TCCTCCGATC 


GTTGTCAGAA 


GTAAGTTGGC 


7020 


ACTGCATAAT 


TCTCTTACTG 


TCATGCCATC 


7080 


CTCAACCAAG 


TCATTCTGAG 


AATAGTGTAT 


7140 


AACACGGGAT 


AATACCGCGC 


CACATAGCAG 


7200 


TTCTTCGGGG 


CGAAAACTCT 


CAAGGATCTT 


7260 


CACTCGTGCA 


CCCAACTGAT 


CTTCAGCATC 


7320 


AAAAACAGGA 


AGGCAAAATG 


CCGCAAAAAA 


7380 


ACTCATACTC 


TTCCTTTTTC 


AATATTATTG 


7440 


CGGATACATA 


TTTGAATGTA 


TTTAGAAAAA 


7500 


CCGAAAAGTG 


CCACCTGACG 


TCTAAGAAAC 


7560 


TAGGCGTATC 


ACGAGGCCCT 


TTCGTCT 


7617 



wo 97/42320 



PCT/US97/07625 



89 



(2) INFORMATION FOR SEQ ID NO: 35; 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1SS81 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1.. 15581 

(D) OTHER INFORMATION: /note* 



"pNLnSGll' 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

TGGAAGGGCT AATTTGGTCC CAAAAAAGAC AAGAGATCCT TGATCTGTGG ATCTACCACA 60 

CACAAGGCTA CTTCCCTGAT TGGCAGAACT ACACACCAGG GCCAGGGATC AGATATCCAC 120 

TGACCTTTGG ATGGTGCTTC AAGTTAGTAC CAGTTGAACC AGAGCAAGTA GAAGAGGCCA 180 

AATAAGGAGA GAAGAACAGC TTGTTACACC CTATGAGCCA GCATGGGATG GAGGACCCGG 240 

AGGGAGAAGT ATTAGTGTGG AAGTTTGACA GCCTCCTAGC ATTTCGTCAC ATGGCCCGAG 300 

AGCTGCATCC GGAGTACTAC AAAGACTGCT GACATCGAGC TTTCTACAAG GGACTTTCCG 360 

CTGGGGACTT TCCAGGGAGG TGTGGCCTGG GCGGGACTGG GGAGTGGCGA GCCCTCAGAT 42 0 

GCTACATATA AGCAGCTGCT TTTTGCCTGT ACTGGGTCTC TCTGGTTAGA CCAGATCTGA 480 

GCCTGGGAGC TCTCTGGCTA ACTAGGGAAC CCACTGCTTA AGCCTCAATA AAGCTTGCCT 54 0 

TGAGTGCTCA AAGTAGTGTG TGCCCGTCTG TTGTGTGACT CTGGTAACTA GAGATCCCTC 600 

AGACCCTTTT AGTCAGTGTG GAAAATCTCT AGCAGTGGCG CCCGAACAGG GACTTGAAAG 66 0 

CGAAAGTAAA GCCAGAGGAG ATCTCTCGAC GCAGGACTCG GCTTGCTGAA GCGCGCACGG 720 

CAAGAGGCGA GGGGCGGCGA CTGGTGAGTA CGCCAAAAAT TTTGACTAGC GGAGGCTAGA 780 

AGGAGAGAGA TGGGTGCGAG AGCGTCGGTA TTAAGCGGGG GAGAATTAGA TAAATGGGAA 840 

AAAATTCGGT TAAGGCCAGG GGGAAAGAAA CAATATAAAC TAAAACATAT AGTATGGGCA 900 

AGCAGGGAGC TAGAACGATT CGCAGTTAAT CCTGGCCTTT TAGAGACATC AGAAGGCTGT 960 

AGACAAATAC TGGGACAGCT ACAACCATCC CTTCAGACAG GATCAGAAGA ACTTAGATCA 1020 

TTATATAATA CAATAGCAGT CCTCTATTGT GTGCATCAAA GGATAGATGT AAAAGACACC 1080 

AAGGAAGCCT TAGATAAGAT AGAGGAAGAG CAAAACAAAA GTAAGAAAAA GGCACAGCAA 1140 

GCAGCAGCTG ACACAGGAAA CAACAGCCAG GTCAGCCAAA ATTACCCTAT AGTGCAGAAC 1200 

CTCCAGGGGC AAATGGTACA TCAGGCCATA TCACCTAGAA CTTTAAATGC ATGGGTAAAA 1260 

GTAGTAGAAG AGAAGGCTTT CAGCCCAGAA GTAATACCCA TGTTTTCAGC ATTATCAGAA 1320 

GGAGCCACCC CACAAGATTT AAATACCATG CTAAACACAG TGGGGGGACA TCAAGCAGCC 1380 

ATGCAAATGT TAAAAGAGAC CATCAATGAG GAAGCTGCAG AATGGGATAG ATTGCATCCA 1440 

GTGCATGCAG GGCCTATTGC ACCAGGCCAG ATGAGAGAAC CAAGGGGAAG TGACATAGCA 1500 
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GGAACTACTA GTACCCTTCA GGAACAAATA 
GTAGGAGAAA TCTATAAAAG ATGGATAATC 
5 AGCCCTACCA GCATTCTGGA CATAAGACAA 

GACCGATTCT ATAAAACTCT AAGAGCCGAG 
ACAGAAACCT TGTTGGTCCA AAATGCGAAC 

10 

GGACCAGGAG CGACACTAGA AGAAATGATG 
CATAAAGCAA GAGTTTTGGC TGAAGCAATG 
15 ATACAGAAAG GCAATTTTAG GAACCAAAGA 

GAAGGGCACA TAGCCAAAAA TTGCAGGGCC 
AAGGAAGGAC ACCAAATGAA AGATTGTACT 

20 

TGGCCTTCCC ACAAGGGAAG GCCAGGGAAT 
CCACCAGAAG AGAGCTTCAG GTTTGGGGAA 
25 CCGATAGACA AGGAACTGTA TCCTTTAGCT 

TCGTCACAAT AAAGATAGGG GGGCAATTAA 
ATACAGTATT AGAAGAAATG AATTTGCCAG 

30 

TTGGAGGTTT TATCAAAGTA GGACAGTATG 
AAGCTATAGG TACAGTATTA GTAGGACCTA 
35 TGACTCAGAT TGGCTGCACT TTAAATTTTC 

AATTAAAGCC AGGAATGGAT GGCCCAAAAG 
TAAAAGCATT AGTAGAAATT TGTACAGAAA 

40 

GGCCTGAAAA TCCATACAAT ACTCCAGTAT 
GGAGAAAATT AGTAGATTTC AGAGAACTTA 
45 AATTAGGAAT ACCACATCCT GCAGGGTTAA 

TGGGCGATGC ATATTTTTCA GTTCCCTTAG 
CCATACCTAG TATAAACAAT GAGACACCAG 

50 

AGGGATGGAA AGGATCACCA GCAATATTCC 
TTAGAAAACA AAATCCAGAC ATAGTCATCT 
55 CTGACTTAGA AATAGGGCAG CATAGAACAA 

GGTGGGGATT TACCACACCA GACAAAAAAC 
GTTATGAACT CCATCCTGAT AAATGGACAG 

60 

GCTGGACTGT CAATGACATA CAGAAATTAG 
ATGCAGGGAT TAAAGTAAGG CAATTATGTA 
65 AAGTAGTACC ACTAACAGAA GAAGCAGAGC 

AAGAACCGGT ACATGGAGTG TATTATGACC 
AGCAGGGGCA AGGCCAATGG ACATATCAAA 
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GGATGGATGA 


CACATAATCC 


ACCTATCCCA 


1560 


CTGGGATTAA 


ATAAAATAGT 


AAGAATGTAT 


1620 


GGACCAAAGG 


AACCCTTTAG 


AGACTATGTA 


1680 


CAAGCTTCAC 


AAGAGGTAAA 


AAATTGGATG 


1740 


CCAGATTGTA 


AGACTATTTT 


AAAAGCATTG 


1800 


ACAGCATGTC 


AGGGAGTGGG 


GGGACCCGGC 


1860 


AGCCAAGTAA 


CAAATCCAGC 


TACCATAATG 


1920 


AAGACTGTTA 


AGTGTTTCAA 


TTGTGGCAAA 


1980 


CCTAGGAAAA 


AGGGCTGTTG 


GAAATGTGGA 


2040 


GAGAGACAGG 


CTAATTTTTT 


AGGGAAGATC 


2100 


TTTCTTCAGA 


GCAGACCAGA 


GCCAACAGCC 


2160 


GAGACAACAA 


CTCCCTCTCA 


GAAGCAGGAG 


2220 


TCCCTCAGAT 


CACTCTTTGG 


CAGCGACCCC 


2280 


AGGAAGCTCT 


ATTAGATACA 


GGAGCAGATG 


2340 


GAAGATGGAA 


ACCAAAAATG 


ATAGGGGGAA 


2400 


ATCAGATACT 


CATAGAAATC 


TGCGGACATA 


2460 


CACCTGTCAA 


CATAATTGGA 


AGAAATCTGT 


2520 


CCATTAGTCC 


TATTGAGACT 


GTACCAGTAA 


2560 


TTAAACAATG 


GCCATTGACA 


GAAGAAAAAA 


2640 


TGGAAAAGGA 


AGGAAAAATT 


TCAAAAATTG 


2700 


TTGCCATAAA 


GAAAAAAGAC 


AGTACTAAAT 


2760 


ATAAGAGAAC 


TCAAGATTTC 


TGGGAAGTTC 


2820 


AACAGAAAAA 


ATCAGTAACA 


GTACTGGATG 


2880 


ATAAAGACTT 


CAGGAAGTAT 


ACTGCATTTA 


2940 


GGATTAGATA 


TCAGTACAAT 


GTGCTTCCAC 


3000 


AGTGTAGCAT 


GACAAAAATC 


TTAGAGCCTT 


3060 


ATCAATACAT 


GGATGATTTG 


TATGTAGGAT 


3120 


AAATAGAGGA 


ACTGAGACAA 


CATCTGTTGA 


3180 


ATCAGAAAGA 


ACCTCCATTC 


CTTTGGATGG 


3240 


TACAGCCTAT 


AGTGCTGCCA 


GAAAAGGACA 


3300 


TGGGAAAATT 


GAATTGGGCA 


AGTCAGATTT 


3360 


AACTTCTTAG 


GGGAACCAAA 


GCACTAACAG 


3420 


TAGAACTGGC 


AGAAAACAGG 


GAGATTCTAA 


3480 


CATCAAAAGA 


CTTAATAGCA 


GAAATACAGA 


3540 


TTTATCAAGA 


GCCATTTAAA 


AATCTGAAAA 


3600 
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CAGGAAAATA TGCAAGAATG AAGGGTGCCC ACACTAATGA TGTGAAACAA TTAACAGAGG 3660 

CAGTACAAAA AATAGCCACA GAAAGCATAG TAATATGGGG AAAGACTCCT AAATTTAAAT 3 720 

TACCCATACA AAAGGAAACA TGGGAAGCAT GGTGGACAGA GTATTGGCAA GCCACCTGGA 3 780 

TTCCTGAGTG GGAGTTTGTC AATACCCCTC CCTTAGTGAA GTTATGGTAC CAGTTAGAGA 3840 

AAGAACCCAT AATAGGAGCA GAAACTTTCT ATGTAGATGG GGCAGCCAAT AGGGAAACTA 3900 

AATTAGGAAA AGCAGGATAT GTAACTGACA GAGGAAGACA AAAAGTTGTC CCCCTAACGG 3960 

ACACAACAAA TCAGAAGACT GAGTTACAAG CAATTCATCT AGCTTTGCAG GATTCGGGAT 4020 

TAGAAGTAAA CATAGTGACA GACTCACAAT ATGCATTGGG AATCATTCAA GCACAACCAG 4080 

ATAAGAGTGA ATCAGAGTTA GTCAGTCAAA TAATAGAGCA GTTAATAAAA AAGGAAAAAG 414 0 

TCTACCTGGC ATGGGTACCA GCACACAAAG GAATTGGAGG AAATGAACAA GTAGATGGGT 4200 

TGGTCAGTGC TGGAATCAGG AAAGTACTAT TTTTAGATGG AATAGATAAG GCCCAAGAAG 4 260 

AACATGAGAA ATATCACAGT AATTGGAGAG CAATGGCTAG TGATTTTAAC CTACCACCTG 4 320 

TAGTAGCAAA AGAAATAGTA GCCAGCTGTG ATAAATGTCA GCTAAAAGGG GAAGCCATGC 4380 

ATGGACAAGT AGACTGTAGC CCAGGAATAT GGCAGCTAGA TTGTACACAT TTAGAAGGAA 444 0 

AAGTTATCTT GGTAGCAGTT CATGTAGCCA GTGGATATAT AGAAGCAGAA GTAATTCCAG 4500 

CAGAGACAGG GCAAGAAACA GCATACTTCC TCTTAAAATT AGCAGGAAGA TGGCCAGTAA 4560 

AAACAGTACA TACAGACAAT GGCAGCAATT TCACCAGTAC TACAGTTAAG GCCGCCTGTT 4620 

GGTGGGCGGG GATCAAGCAG GAATTTGGCA TTCCCTACAA TCCCCAAAGT CAAGGAGTAA 4680 

TAGAATCTAT GAATAAAGAA TTAAAGAAAA TTATAGGACA GGTAAGAGAT CAGGCTGAAC 4 740 

ATCTTAA6AC AGCAGTACAA ATGGCAGTAT TCATCCACAA TTTTAAAAGA AAAGGGGGGA 4800 

TTGGGGGGTA CAGTGCAGGG GAAAGAATAG TAGACATAAT AGCAACAGAC ATACAAACTA 4860 

AAGAATTACA AAAACAAATT ACAAAAATTC AAAATTTTCG GGTTTATTAC AGGGACAGCA 4 920 

GAGATCCAGT TTGGAAAGGA CCAGCAAAGC TCCTCTGGAA AGGTGAAGGG GCAGTAGTAA 4980 

TACAAGATAA TAGTGACATA AAAGTAGTGC CAAGAAGAAA AGCAAAGATC ATCAGGGATT 504 0 

ATGGAAAACA GATGGCAGGT GATGATTGTG TGGCAAGTAG ACAGGATGAG GATTAACACA 5100 

TGGAAAAGAT TAGTAAAACA CCATATGTAT ATTTCAA6GA AAGCTAAGGA CTGGTTTTAT 5160 

AGACATCACT ATGAAAGTAC TAATCCAAAA ATAAGTTCAG AAGTACACAT CCCACTAGGG 5220 

GATGCTAAAT TAGTAATAAC AACATATTGG GGTCTGCATA CAGGAGAAAG AGACTGGCAT . 5280 

TTGGGTCAGG GAGTCTCCAT AGAATGGAGG AAAAAGAGAT ATAGCACACA AGTAGACCCT 5340 

GACCTAGCAG ACCAACTAAT TCATCTGCAC TATTTTGATT GTTTTTCAGA ATCTGCTATA 5400 

AGAAATACCA TATTAGGACG TATAGTTAGT CCTAGGTGTG AATATCAAGC AGGACATAAC 5460 

AAGGTAGGAT CTCTACAGTA CTTGGCACTA GCAGCATTAA TAAAACCAAA ACAGATAAAG 5520 

CCACCTTTGC CTAGTGTTAG GAAACTGACA GAGGACAGAT GGAACAAGCC CCAGAAGACC 5580 

AAGGGCCACA GAGGGAGCCA TACAATGAAT GGACACTAGA GCTTTTAGAG GAACTTAAGA 5640 

GTGAAGCTGT TAGACATTTT CCTAGGATAT GGCTCCATAA CTTAGGACAA CATATCTATG 5700 



wo 97/42320 

AAACTTACGG GGATACTTGG GCAGGAGTGG 
TGTTTATCCA TTTCAGAATT GGGTGTCGAC 
5 GAGCAAGAAA TGGAGCCAGT AGATCCTAGA 

CCTAAAACTG CTTGTACCAA TTGCTATTGT 
TTCATGACAA AAGCCTTAGG CATCTCCTAT 

10 

GCTCATCAGA ACAGTCAGAC TCATCAAGCT 
ATGCAACCTA TAATAGTAGC AATAGTAGCA 
15 GTGTGGTCCA TAGTAATCAT AGAATATAGG 

TTAATTGATA GACTAATAGA AAGAGCAGAA 
TCAGCACTTG TGGAGATGGG GGTGGAAATG 

20 

CTGTAGTGCT ACAGAAAAAT TGTGGGTCAC 
AGCAACCACC ACTCTATTTT GTGCATCAGA 
25 TGTTTGGGCC ACACATGCCT GTGTACCCAC 

AAATGTGACA GAAAATTTTA ACATGTGGAA 
TATAATCAGT TTATGGGATC AAAGCCTAAA 

30 

TAGTTTAAAG TGCACTGATT TGAAGAATGA 
GATAATGGAG AAAGGAGAGA TAAAAAACTG 

3 5 TAAGGTGCAG AAAGAATATG CATTCTTTTA 

CAGCTATAiSG TTGATAAGTT GTAACACCTC 
CTTTGAGCCA ATTCCCATAC ATTATTGTGC 

40 

TAATAAGACG TTCAATGGAA CAGGACCATG 
TGGAATC7VGG CCAGTAGTAT CAACTCAACT 

4 5 TGTAGTAATT AGATCTGCCA ATTTCACAGA 

CACATCTGTA GAAATTAATT GTACAAGACC 
CCAGAGGGGA CCAGGGAGAG CATTTGTTAC 

50 

ACATTGTAAC ATTAGTAGAG CAAAATGGAA 
AAGAGAACAA TTTGGAAATA ATAAAACAAT 
55 AGAAATTGTA ACGCACAGTT TTAATTGTGG 

ACTGTTTAAT AGTACTTGGT TTAATAGTAC 
AGGAAGTGAC ACAATCACAC TCCCATGCAG 

60 

AGTAGGAAAA GCAATGTATG CCCCTCCCAT 
TACTGGGCTG CTATTAACAA GAGATGGTGG 
65 ACCTGGAGGA GGCGATATGA GGGACAATTG 

AAAAATTGAA CCATTAGGAG TAGCACCCAC 
AAAAAGAGCA GTGGGAATAG GAGCTTTGTT 
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AAGCCATAAT 


AAGAATTCTG 


CAACAACTGC 


5760 


ATAGCAGAAT 


AGGCGTTACT 


CGACAGAGGA 


5620 


CTAGAGCCCT 


GGAAGCATCC 


AGGAAGTCAG 


5880 


AAAAAGTGTT 


GCTTTCATTG 


CCAAGTTTGT 


5940 


GGCAGGAAGA 


AGCGGAGACA 


GCGACGAAGA 


6000 


TCTCTATCAA 


AGCAGTAAGT 


AGTACATGTA 


6060 


TTAGTAGTAG 


CAATAATAAT 


AGCAATAGTT 


6120 


AAAATATTAA 


GACAAAGAAA 


AATAGACAGG 


6180 


GACAGTGGCA 


ATGAGAGTGA 


AGGAGAAGTA 


6240 


GGGCACCATG 


CTCCTTGGGA 


TATTGATGAT 


6300 


AGTCTATTAT 


GGGGTACCTG 


TGTGGAAGGA 


6360 


TGCTAAAGCA 


TATGATACAG 


AGGTACATAA 


6420 


AGACCCCAAC 


CCACAAGAAG 


TAGTATTGGT 


6480 


AAATGACATG 


GTAGAACAGA 


TGCATGAGGA 


6540 


GCCATGTGTA 


AAATTAACCC 


CACTCTGTGT 


6600 


TACTAATACC 


AATAGTAGTA 


GCGGGAGAAT 


6660 


CTCTTTCAAT 


ATCAGCACAA 


GCATAAGAGA 


6720 


TAAACTTGAT 


ATAGTACCAA 


TAGATAATAC 


6780 


AGTCATTACA 


CAGGCCTGTC 


CAAAGGTATC 


684 0 


CCCGGCTGGT 


TTTGCGATTC 


TAAAATGTAA 


6900 


TACAAATGTC 


AGCACAGTAC 


AATGTACACA 


6960 


GCTGTTAAAT 


GGCAGTCTAG 


CAGAAGAAGA 


7020 


CAATGCTAAA 


ACCATAATAG 


TACAGCTGAA 


7080 


CAACAACAAT 


ACAAGAAAAA 


GTATCCGTAT 


7140 


AATAGGAAAA 


ATAGGAAATA 


TGAGACAAGC 


7200 


TGCCACTT7A 


AAACAGATAG 


CTAGCAAATT 


7260 


AATCTTTAAG 


CAATCCTCAG 


GAGGGGACCC 


7320 


AGGGGAATTT 


TTCTACTGTA 


ATTCAACACA 


7380 


TTGGAGTACT 


GAAGGGTCAA 


ATAACACTGA 


7440 


AATAAAACAA 


TTTATAAACA 


TGTGGCAGGA 


7S00 


CAGTGGACAA 


ATTAGATGTT 


CATCAAATAT 


7560 


TAATAACAAC 


AATGGGTCCG 


AGATCTTCAG 


7620 


GAGAAGTGAA 


TTATATAAAT 


ATAAAGTAGT 


7680 


CAAGGCAAAG 


AGAAGAGTGG 


TGCAGAGAGA 


7740 


CCTTGGGTTC 


TTGGGAGCAG 


CAGGAAGCAC 


7800 
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TATGGGCGCA GCGTCAATGA CGCTGACGGT 
GCAGCAGCAG AACAATTTGC TGAGGGCTAT 
5 AGTCTGGGGC ATCAAACAGC TCCAGGCAAG 

TCAACAGCTC CTGGGGATTT GGGGTTGCTC 
TTGGAATGCT AGTTGGAGTA ATAAATCTCT 

10 

GGAGTGGGAC AGAGAAATTA ACAATTACAC 
GCAAAACCAG CAAGAAAAGA ATGAACAAGA 
15 GTGGAATTGG TTTAACATAA CAAATTGGCT 

AGGAGGCTTG GTAGGTTTAA GAATAGTTTT 
GCAGGGATAT TCACCATTAT CGTTTCAGAC 

20 

GCCCGAAGGA ATAGAAGAAG AAGGTGGAGA 
GAACGGATCC TTAGCACTTA TCTGGGACGA 
25 CCGCTTGAGA GACTTACTCT TGATTGTAAC 

GTGGGAAGCC CTCAAATATT GGTGGAATCT 
TAGTGCTGTT AACTTGCTCA ATGCCACAGC 

30 

TATAGAAGTA TTACAAGCAG CTTATAGAGC 
GGGCTTGGAA AGGATTTTGC TATAAGATGG 
35 GATGGCCTGC TGTAAGGGAA AGAATGAGAC 

AAGAACTCTT CACTGGAGTT GTCCCAATTC 
ACAAGTTCTC TGTCAGTGGA GAGGGTGAAG 

40 

AGTTCATCTG CACTACTGGC AAACTGCCTG 
CTTATGGTGT TCAATGCTTT TCAAGATACC 
4 5 AGAGTGCCAT GCCCGAAGGT TATGTACAGG 

ACTACAAGAC ACGTGCTGAA GTCAAGTTTG 
TAAAAGGTAT TGACTTCAAG GAAGATGGCA 

50 

ATAACTCACA C7UVTGTATAC ATCATGGCAG 
TCAAGACCCG CCACAACATT GAAGATGGAA 
55 ATACTCCAAT TGGCGATGGC CCTGTCCTTT 

CTGCCCTTTC GAAAGATCCC AACGAAAAGA 
CAGCTGCTGG GATTACACAT GGCATGGATG 

60 

ACATGGAGCA ATCACAAGTA GCAATACAGC 
AGCACAAGAG GAGGAAGAGG TGGGTTTTCC 
65 GACTTACAAG GCAGCTGTAG ATCTTAGCCA 

GCTAATTCAC TCCCAAAGAA GACAAGATAT 
CTACTTCCCT GATTGGCAGA ACTACACACC 



93 

ACAGGCCAGA CAATTATTGT CTGATATAGT 7860 
TGAGGCGCAA CAGCATCTGT TGCAACTCAC 7920 
AATCCTGGCT GTGGAAAGAT ACCTAAAGGA 7980 
TGGAAAACTC ATTTGCACCA CTGCTGTGCC 8040 
GGAACAGATT TGGAATAACA TGACCTGGAT 8100 

AAGCTTAATA CACTCCTTAA TTGAAGAATC 816 0 

ATTATTGGAA TTAGATAAAT GGGCAAGTTT 8220 

GTGGTATATA AAATTATTCA TAATGATAGT 8280 

TGCTGTACTT TCTATAGTGA ATAGAGTTAG 8340 

CCACCTCCCA ATCCCGAGGG GACCCGACAG 8400 

GAGAGACAGA GACAGATCCA TTCGATTAGT 8460 

TCTGCGGAGC CTGTGCCTCT TCAGCTACCA 8520 

GAGGATTGTG GAACTTCTGG GACGCAGGGG 8580 

CCTACAGTAT TGGAGTCAGG AACTAAAGAA 8640 

CATAGCAGTA GCTGAGGGGA CAGATAGGGT 8700 

TATTCGCCAC ATACCTAGAA GAATAAGACA 8760 

GTGGCAAGTG GTCAAAAAGT AGTGTGATTG 8820 

GAGCTGAGCA AGAAATGGCT AGCAAAGGAG 8880 

TTGTTGAATT AGATGGTGAT GTTAACGGCC 8940 

GTGATGCAAC ATACGGAAAA CTTACCCTGA 9000 

TTCCATGGCC AACACTTGTC ACTACTCTCT 9060 

CGGATCATAT GAAACGGCAT GACTTTTTCA 9120 

AAAGGACCAT CTTCTTCAAA GATGACGGCA 9180 

AAGGTGATAC CCTTGTTAAT AGAATCGAGT 924 0 

ACATTCTGGG ACACAAATTG GAATACAACT 9300 

ACAAACAAAA GAATGGAATC AAAGTGAACT 9360 

GCGTTCAACT AGCAGACCAT TATCAACAAA 9420 

TACCAGACAA CCATTACCTG TCCACACAAT 9480 

GAGACCACAT GGTCCTTCTT GAGTTTGTAA 9540 

AACTGTACAA CGGACTCGAG ACCTAGAAAA 9600 

AGCTAACAAT GCTGCTTGTG CCTGGCTAGA 9660 

AGTCACACCT CAGGTACCTT TAAGACCAAT 9720 

CTTTTTAAAA GAAAAGGGGG GACTGGAAGG 9780 

CCTTGATCTG TGGATCTACC ACACACAAGG 984 0 

AGGGCCAGGG GTCAGATATC CACTGACCTT 9900 
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TGGATGGTGC TACAAGCTAG TACCAGTTGA 
AGAGAACACC AGCTTGTTAC ACCCTGTGAG 
5 AGTGTTAGAG TGGAGGTTTG ACAGCCGCCT 

TCCGGAGTAC TTCAAGAACT GCTGACATCG 
CTTTCCAGGG AGGCGTGGCC TGGGCGGGAC 

10 

ATAAGCAGCT GCTTTTTGCC TGTACTGGGT 
AGCTCTCTGG CTAACTAGGG AACCCACTGC 
15 TTCAAGTAGT GTGTGCCCGT CTGTTGTGTG 

TTTAGTCAGT GTGGAAAATC TCTAGCACCC 
ATCGCGCCAC TGCATTCCAG CCTGGGCAAG 

20 

AGTTAAGGGT ATTAAATATA TTTATACATG 
GGCGCAGTGG CTCACACCTG CGCCCGGCCC 
25 AGTTTGGGAG TTCCAGACCA GCCTGACCAA 

AGTAGATTTT ATTTTATGTG TATTTTATTC 
TTCCTCTACT CTGATACCAC AAGAATCATC 

30 

TGGTGGGAGA GGGAGGTTTT CACCAGCACA 
G6TGTCCTTC GGTTCAGTTC CAACACCGCC 
35 GGGCTCAGTC CCCAAGACAT AAACACCCAA 

TGCTGCCCAG GCAGAGCCGA TTCACCAAGA 
CACAGAGCCG GCTGTGCGGG AGAACGGAGT 

40 

CATTCGGGGA TCAGAGTTTT TAAGGATAAC 

TGAAAGCGTA GGGAGTCGAA GGTGTCCTTT 

45 CAAGATCGGA TGAGCCAGTT TATCAATCCG 

TCTGCAAAAT ATCTCAAGCA CTGATTGATC 

GAACAATTTG GGGAAGGTCA GAATCTTGTA 

50 

TTTCTTTTTT GTTTTTmT TTTTATTTTT 

GGAGTGCAGT GGTGCAATCA CAGCTCACTG 

55 TCCCACCTCA GCCTGCCTGG TAGCTGAGAC 

TTTTGGTAGA GGCAGCGTTT TGCCGTGTGG 

GTGATCCAGC CTCAGCCTCC CAAAGTGCTG 

60 

CCTAAACCAT AATTTCTAAT CTTTTGGCTA 
CCCAGGCAAA AAGGGGGTTT GTTTCGGGAA 
65 AAACTAAGTT CCTCCTAAAC TTAGTTCGGC 

GAGGTTAGAA GCACGATGGA ATTGGTTAGG 
TTTGCAATGG TGGTTCAAAG ACTGCCCGCT 
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GCCAGATAAG 


GTAGAAGAGG 


CCAATAAAGG 


9960 


CCTGCATGGA 


ATGGATGACC 


CTGAGAGAGA 


10020 


AGCATTTCAT 


CACGTGGCCC 


GAGAGCTGCA 


10080 


AGCTTGCTAC 


AAGGGACTTT 


CCGCTGGGGA 


10140 


TGGGGAGTGG 


CGAGCCCTCA 


GATGCTGCAT 


10200 


CTCTCTGGTT 


AGACCAGATC 


TGAGCCTGGG 


10260 


TTAAGCCTCA 


ATAAAGCTTG 


CCTTGAGTGC 


10320 


ACTCTGGTAA 


CTAGAGATCC 


CTCAGACCCT 


10380 


CCCAGGAGGT 


AGAGGTTGCA 


GTGAGCCAAG 


10440 


AAAACAAGAC 


TGTCTAAAAT 


AATAATAATA 


10500 


GAGGTCATAA 


AAATATATAT 


ATTTGGGCTG 


10S60 


TTTGGGAGGC 


CGAGGCAGGT 


GGATCACCTG 


10620 


CATGGAGAAA 


CCCCTTCTCT 


GTGTATTTTT 


10680 


ACAGGTATTT 


CTGGAAAACT 


GAAACTGTTT 


10740 


AGCACAGAGG 


AAGACTTCTG 


TGATCAAATG 


10800 


TGAGCAGTCA 


GTTCTGCCGC 


AGACTCGGCG 


10860 


TGCCTGGAGA 


GAGGTCAGAC 


CACAGGGTGA 


10920 


GACATAAACA 


CCCAACAGGT 


CCACCCCGCC 


10980 


CGGGAATTAG 


GATAGAGAAA 


GAGTAAGTCA 


11040 


TCTATTATGA 


CTCAAATCAG 


TCTCCCCAAG 


11100 


TTAGTGTGTA 


GGGGGCCAGT 


GAGTTGGAGA 


11160 


TGOGCCGAGT 


CAGTTCCTGG 


GTGGGGGCCA 


11220 


GGGGTGCCAG 


CTGATCCATG 


GAGTGCAGGG 


11280 


TTAGGTTTTA 


CAATAGTGAT 


GTTACCCCAG 


11340 


GCCTGTAGCT 


GCATGACTCC 


TAAACCATAA 


11400 


GAGACAGGGT 


CTCACTCTGT 


CACCTAGGCT 


L1460 


CAGCCTCAAC 


GTCGTAAGCT 


CAAGCGATCC 


11520 


TACAAGCGAC 


GCCCCAGTTA 


ATTTTTGTAT 


11560 


CCCTGGCTGG 


TCTCGAACTC 


CTGGGCTCAA 


11640 


GGACAACCGG 


GGCCAGTCAC 


TGCACCTGGC 


11700 


ATTTGTTAGT 


CCTACAAAGG 


CAGTCTAGTC 


11760 


AGGGCTGTTA 


CTGTCTTTGT 


TTCAAACTAT 


11820 


CTACACCCAG 


GAATGAACAA 


GGAGAGCTTG 


11880 


TCAGATCTCT 


TTCACTGTCT 


GAGTTATAAT 


11940 


TCTGACACCA 


GTCGCTGCAT 


TAATGAATCG 


12000 
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GCCAACGCGC GGGGAGAGGC GGTTTGCGTA TTGGCGCTCT TCCGCTTCCT CGCTCACTGA 12060 
CTCGCTGCGC TCGGTCGTTC GGCTGCGGCG AGCGGTATCA GCTCACTCAA AGGCGGTAAT 12120 
ACGGTTATCC ACAGAATCAG GGGATAACGC AGGAAAGAAC ATGTGAGCAA AAGGCCAGCA 12180 
AAAGGCCAGG AACCGTAAAA AGGCCGCGTT GCTGGCGTTT TTCCATAGGC TCCGCCCCCC 1224 0 
TGACGAGCAT CACAAAAATC GACGCTCAAG TCAGAGGTGG CGAAACCCGA CAGGACTATA 12 3 00 
AAGATACCAG GCGTTTCCCC CTGGAAGCTC CCTCGTGCGC TCTCCTGTTC CGACCCTGCC 12360 
GCTTACCGGA TACCTGTCCG CCTTTCTCCC TTCGGGAAGC GTGGCGCTTT CTCAATGCTC 12420 
ACGCTGTAGG TATCTCAGTT CGGTGTAGGT CGTTCGCTCC AAGCTGGGCT GTGTGCACGA 124 80 
ACCCCCCGTT CAGCCCGACC GCTGCGCCTT ATCCGGTAAC TATCGTCTTG AGTCCAACCC 1254 0 
GGTAAGACAC GACTTATCGC CACTGGCAGC AGCCACTGGT AACAGGATTA GCAGAGCGAG 12600 
GTATGTAGGC GGTGCTACAG AGTTCTTGAA GTGGTGGCCT AACTACGGCT ACACTAGAAG 12660 
GACAGTATTT GGTATCTGCG CTCTGCTGAA GCCAGTTACC TTCGGAAAAA GAGTTGGTAG 12 720 
CTCTTGATCC GGCAAACAAA CCACCGCTGG TAGCGGTGGT TTTTTTGTTT GCAAGCAGCA 12 7 80 
GATTACGCGC AGAAAAAAAG GATCTCAAGA AGATCCTTTG ATCTTTTCTA CGGGGTCTGA 1284 0 
CGCTCAGTGG AACGAAAACT CACX3TTAAGG GATTTTGGTC ATGAGATTAT CAAAAAGGAT 12900 
CTTCACCTAG ATCCTTTTAA ATTAAAAATG AAGTTTTAAA TCAATCTAAA GTATATATGA 12960 
GTAAACTTGG TCTGACAGTT ACCAATGCTT AATCAGTGAG GCACCTATCT CAGCGATCTG 13020 
TCTATTTCGT TCATCCATAG TTGCCTGACT CCCCGTCGTG TAGATAACTA CGATACGGGA 13080 
GGGCTTACCA TCTGGCCCCA GTGCTGCAAT GATACCGCGA GACCCACGCT CACCGGCTCC 1314 0 

AGATTTATCA GCAATAAACC AGCCAGCCGG AAGGGCCGAG CX3CAGAAGTG GTCCTGCAAC 13200 

TTTATCCGCC TCCATCCAGT CTATTAATTG TTGCCGGGAA GCTAGAGTAA GTAGTTCGCC 13260 

AGTTAATAGT TTGCGCAACG TTGTTGCCAT TGCTACAGGC ATCGTGGTGT CACGCTCGTC 13320 

GTTTGGTATG GCTTCATTCA GCTCCGGTTC CCAACGATCA AGGCGAGTTA CATGATCCCC 13380 

CATGTTGTGC AAAAAAGCGG TTAGCTCCTT CGGTCCTCCG ATCGTTGTCA GAAGTAAGTT 13440 

GGCCGCAGTG TTATCACTCA TGGTTATGGC AGCACTGCAT AATTCTCTTA CTGTCATGCC 13500 

ATCCGTAAGA TGCTTTTCTG TGACTGGTGA GTACTCAACC AAGTCATTCT GAGAATAGTG 13560 

TATGCGGCGA CCGAGTTGCT CTTGCCCGGC GTCAATACGG GATAATACCG CGCCACATAG 13620 

CAGAACTTTA AAAGTGCTCA TCATTGGAAA ACGTTCTTCG GGGCGAAAAC TCTCAAGGAT 13680 

CTTACCGCTG TTGAGATCCA GTTCGATGTA ACCCACTCGT GCACCCAACT GATCTTCAGC 13740 

ATCTTTTACT TTCACCAGCG TTTCTGGGTG AGCAAAAACA GGAAGGCAAA ATGCCGCAAA 13800 

AAAGGGAATA AGGGCGACAC GGAAATGTTG AATACTCATA CTCTTCCTTT TTCAATATTA 13860 

TTGAAGCATT TATCAGGGTT ATTGTCTCAT GAGCGGATAC ATATTTGAAT GTATTTAGAA 13920 

AAATAAACAA ATAGGGGTTC CGCGCACATT TCCCCGAAAA GTGCCACCTG ACGTCTAAGA 13980 

AACCATTATT ATCATGACAT TAACCTATAA AAATAGGCGT ATCACGAGGC CCTTTCGTCT 14040 

TCAAGAACTG CCTCGCGCGT TTCGGTGATG ACGGTGAAAA CCTCTGACAC ATGCAGCTCC 14100 



wo 97/42320 



PCT/US97/07625 



CGGAGACGGT CACAGCTTGT CTGTAAGCGG 
CGTCAGCGGG TGTTGGCGGG TGTCGGGGCG 
5 GAGTGTACTG GCTTAACTAT GCGGCATCAG 

GGTGTGAAAT ACCGCACAGA TGCGTAAGGA 
TCAGGCTGCG CAACTGTTGG GAAGGGCGAT 

10 

GCGGGGAGGC AGAGATTGCA GTAAGCTGAG 
AGAGTAAGAC TCTGTCTCAA AAATAAAATA 
15 CTTTATTTAT TTATTTATTT TCTATTTTGG 

ACATATATTC TATTTTTCTT TATATGCTCC 
TGTATACAAA ATCTAGGCCA GTCCAGCAGA 

20 

ATAAATAAAA TCTAGCTCAC TCCTTCACAT 
TACCAAATAA CCCATCTTGT CCTCAATAAT 
25 CCTGTCAAAG GCATGTGCCC CTTCCGGGCG 

GGACTCTGCA GGGTCCCTAA CTGCCAAGCC 
TCTAGCGGCT GCCCCCACTC GGCTTTGCTT 

30 

AGGTCTGAAA CTAGGTGCGC ACAGAGCGGT 
AGGGGGTTTA TCACAGTGCA CCCTGACAGT 
35 CACCCTGACA GTCGTCAGCC TCACAGGGGG 

ATTTGATTCA CAATTTTTTT AGTCTCTACT 
AGGTGTGTTC CCAGAGGGGA AAACAGTATA 

40 

CTCCACCTGG GTCTTGGAAT GTGTCCCCCG 
ACAGGTCACA GTGACACAAG ATAACCAAGA 
4 5 CTCCACGTGC ACATGGCCGG AGGAACTGCC 

AGAGTCCTTG GTGTGGAGGG AGGGACCAGC 
AACCTAGGGA AAGCCCCAGT TCTACTTACA 

50 
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ATGCCGGGAG CAGACAAGCC CGTCAGGGCG 14160 

CAGCCATGAC CCAGTCACGT AGCGATAGCG 14220 

AGCAGATTGT ACTGAGAGTG CACCATATGC 14280 

GAAAATACCG CATCAGGCGC CATTCGCCAT 1434 0 

CGGTGCGGGC CTCTTCGCTA TTACGCCAGC 144 00 

ATCGCAGCAC TGCACTCCAG CCTGGGCGAC 14460 

AATAAATCAA TCAGATATTC CAATCTTTTC 14 520 

AAACACAGTC CTTCCTTATT CCAGAATTAC 14580 

AGTTTTTTTT AGACCTTCAC CTGAAATGTG 1464 0 

GCCTAAAGGT AAAAAATAAA ATAATAAAAA 14 700 

CAAAATGGAG ATACAGCTGT TAGCATTAAA 14 760 

TTTAAGCGCC TCTCTCCACC ACATCTAACT 14 820 

CTCTGCTGTG CTGCCAACCA ACTGGCATGT 14 880 

CCACAGTGTG CCCTGAGGCT GCCCCTTCCT 14940 

TCCCTAGTTT CAGTTACTTG CGTTCAGCCA 15000 

AAGACTGCGA GAGAAAGAGA CCAGCTTTAC 15060 

CGTCAGCCTC ACAGGGGGTT TATCACATTG 15120 

TTTATCACAG TGCACCCTTA CAATCATTCC 15180 

GTGCCTAACT TGTAAGTTAA ATTTGATCAG 15240 

TACAGGGTTC AGTACTATCG CATTTCAGGC 15300 

AGGGGTGATG ACTACCTCAG TTGGATCTCC 15360 

CACCTCCCAA GGCTACCACA ATGGGCCGCC 15420 

ATGTCGGAGG TGCAAGCACA CCTGCGCATC 15480 

GCAGCTTCCA GCCATCCACC TGATGAACAG 15540 

CCAGGAAAGG C 15S81 
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(2) INFORMATION FOR SEO ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 74 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1 . . 74 

(D) OTHER INFORMATION; /note= "primer #17982" 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:36: 
GGGGCGTACG GAGCGCTCCG AATTCGGTAC CGTTTAAACG GGCCCTCTCG AGTCCGTTGT 
ACAGTTCATC CATG 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..66 

(D) OTHER INFORMATION: /note= "primer #17983'* 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
GGGGGAATTC GCGCGCGTAC GTAAGCGCTA GCTGAGCAAG AAATGGCTAG CAAAGGAGAA 
GAACTC 
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WHAT IS C LAIMED IS : 



1 1. An isolated nucleic acid that encodes an 

2 engineered Aeguorea victoria fluorescent protein, wherein the 

3 protein encoded by the isolated nucleic acid is selected from 

4 the group that consists of: 

5 a. a protein that has leucine at amino acid position 

6 65, and wherein said protein has a cellular 

7 fluorescence that is at least five times greater 

8 than the cellular fluorescence of wild type Aequorea 

9 victoria green fluorescent protein; 

10 b. a protein that has leucine at amino acid position 65 

11 and threonine at position 168, and wherein said 

12 protein has a cellular fluorescence that is at least 

13 five times greater than wild type Aeqruorea victoria 

14 green fluorescent protein; 

15 c. a protein that has leucine at amino acid position 65 

16 threonine at position 168, and cysteine at position 

17 66, wherein said protein has a cellular fluorescence 

18 that is at least five times greater than the 

19 cellular fluorescence of wild type Aequorea victoria 

20 green fluorescent protein; 

21 d. A blue fluorescent protein that has histidine at 

22 amino acid position 67, leucine at position 65 and 

23 has a cellular fluorescence that is at least five 

24 times greater than that of BFP (Tyrg^-^His) ; 

25 e. a blue fluorescent protein that has histidine at 

26 amino acid position 67, alanine at amino acid 

27 position 164 and has a cellular fluorescence that is 

28 at least five times greater than that of 

29 BFP(Tyrg7^His) ; 

30 f . a blue fluorescent protein that has histidine at 

31 amino acid position 67, leucine at amino acid 

32 position 65, alanine at amino acid position 164 and 

33 has a cellular fluorescence that is at least five 

34 times greater than that of BFP (Tyrg^-^His) . 
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1 2. An isolated nucleic acid of claim l, which 

2 encodes an engineered Aequorea victoria green fluorescent 

3 protein {"GFP") having a cellular fluorescence that is at 

4 least five times greater than that of wild type GFP, wherein 

5 the engineered GFP has a leucine at amino acid position 65. 

1 * 3. An isolated nucleic acid according to claim 2, 

2 wherein the nucleic acid further encodes a threonine at amino 

3 acid position 168. 

1 4. An isolated nucleic acid according to claim 3,^ 

2 wherein the nucleic acid further encodes a cysteine at amino 

3 acid position 66. 

1 5. An isolated nucleic acid of claim 1 that 

2 encodes an engineered blue fluorescent protein ("BFP") that 

3 has histidine at amino acid position 67 and leucine at 

4 position 65, and has a cellular fluorescence that is at least 

5 five times greater than that of BFP(Tyrg7->His) . 

1 6. An isolated nucleic acid of claim 1 that 

2 encodes an engineered blue fluorescent protein ("BFP") that 

3 has histidine at amino acid position 67 and alanine at amino 

4 acid position 164, and has a cellular fluorescence that is at 

5 least five times greater than that of BFP (Tyr^^-^His) . 

' ^ 7. An isolated nucleic acid according to claim 6, 

2 wherein the nucleic acid further encodes leucine at amino acid 

3 position 65, 

^ 8. A transformed cell that expresses a protein 

2 encoded by a nucleic acid of claim 1. 

1 9. A vector comprising a nucleic acid of claim 1. 

1 10- A transformed cell comprising a vector of 

2 claim 9. 
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1 11. A transformed cell that expresses a protein 

2 encoded by the nucleic acid of claim l fused to a protein 

3 encoded by a second nucleic acid of interest. 

1 12. An isolated engineered Aequorea victoria green 

2 fluorescent protein ("GFP") wherein the engineered GFP 

3 comprises leucine at amino acid position 65, said engineered 

4 GFP having a cellular fluorescence that is at least five times 

5 greater than wild type GFP. 

1 13 . An isolated engineered Aeguorea victoria green 

2 fluorescent protein ("GFP") according to claim 12, wherein the 

3 engineered GFP has threonine at amino acid position 168. 

1 14 . An isolated engineered Aecjuorea victoria green 

2 fluorescent protein ("GFP") according to claim 13, wherein the 

3 engineered GFP has cysteine at amino acid position 66. 

1 15. An isolated blue fluorescent protein ("BFP") 

2 that comprises histidine at amino acid position 67 and leucine 

3 at amino acid position 65 and has a cellular fluorescence that 

4 is at least five times greater than that of BFP (Tyrg7-^His) . 

1 16. An isolated blue fluorescent protein ("BFP") 

2 that has a histidine at amino acid position 67 and an alanine 

3 at amino acid position 164, that has. a cellular fluorescence 

4 that is at least five times greater than that of 

5 BFP(Tyrg7-His) . 

1 17, An isolated blue fluorescent protein {"BFP") 

2 according to claim 16, wherein the BFP further has leucine at 

3 amino acid position 65. 

1 18. A method of detecting and optionally isolating 

2 an engineered cell that contains a selected nucleic acid which 

3 encodes a selected protein or nucleic acid, comprising: 

4 a) stably introducing into a host cell in a population of 

5 host cells a vector that contains a first nucleic acid which 
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6 encodes a polypeptide selected from the group consisting of 

7 SGll, SG12, SG25, SB42, SB49, SB50 and a second nucleic acid 

8 which encodes a selected protein or nucleic . acid, and 
b) detecting cells in the population of host cells that 

express SGll, SG12, SG25, SB42, SB49, or SB50, and 

optionally sorting cells that express SGll, SG12, 

12 SG25, SB42, SB49, or S350 with a fluorescence-activated cell 

13 sorter to isolate individual cells that express said 

14 fluorescent protein. 

1 19. A nucleic acid construct wherein a coding 

2 sequence selected from the group consisting of sequences that 

3 encode SGll, SG12, SG25, SB42, SB4 9, and SB50 is operably 

4 linked to a regulatory sequence of a selected gene. 



7 



1 20. A nucleic acid construct wherein a first coding 

2 sequence that encodes a selected polypeptide is fused using 

3 genetic engineering to a second coding sequence selected from 

4 the group consisting of sequences that encode SGll, SG12, 

5 SG25, SB42, SB49, and SB50, such that expression of the fused 
€ sequence yields a fluorescent hybrid protein in which the 

polypeptide encoded by the first coding sequence is fused to 

8 the polypeptide encoded by the second coding sequence. 

1 21. A method of detecting and characterizing 

2 regulatory and coding sequence elements that regulate 
subcellular expression and targeting of proteins, comprising: 

a) expressing in an engineered cell, in the presence and 

5 absence of selected culture conditions and components, a 

6 nucleic acid wherein a first nucleic acid selected from the 
group consisting of nucleic acids that encode SGll, SG12, 
SG25, SB42, SB49, and SB50 is operably linked to a second 

9 nucleic acid derived from a selected gene; 

detecting the presence and subcellular localization of 
11 fluorescent signal. 
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